EDUCATIONAL AND PSYCHOLOGICAL 
MEASUREMENT 





Volume | APRIL, 1941 Number 2 





TABLE OF CONTENTS 
Primary MENTAL ABILITIES OF CHILDREN...............000- 105 


Thelma G. Thurstone 


A SratisTicaL EVALUATION OF CLINICAL COUNSELING 
E.G. Williamson and E. §. Bordin 


CONTRIBUTION OF TESTS TO RESEARCH IN THE FieLD oF STUDENT 
PERSONNEL Work 


Ralph IV. Tyler 


GRADE AND AGE Norms For THE MINNESOTA VOCATIONAL TEST 
FOR CLERICAL WORKERS 


Gwendolen G. Schneidler 


EXAMINING EXAMINERS 


Norman J. Powell 


New Criteria FOR OLD 
T.R. Sarbin and E. 8. Bordin 


A Factor ANaLysts OF A NON-VERBAL REASONING TEST 
Robert I. Blakey 


New Tests 








Copyright, 1941, by 


SCIENCE RESEARCH ASSOCIATES 


PRINTED IN THE UNITED STATES OF AMERICA 




















PRIMARY MENTAL ABILITIES OF CHILDREN' 


THELMA G. THURSTONE 
Chicago Teachers College 


OR MANY years psychologists have been accustomed to 

the problems of special abilities and disabilities. These 
are, in fact, the principal concern of the school psychologists 
who deal with children who cannot read, have a blind spot for 
numbers, or do one thing remarkably well and other things 
poorly. It seems strange with all this experience in differential 
psychology that we have clung so long to the practice of 
summarizing a child’s mental endowment by a single index, 
such as the mental age, the intelligence quotient, the percentile 
rank in general intelligence, and other single average meas- 
ures. An average index of mental endowment should be use- 
ful for many educational purposes, but it should not be re- 
garded as more than the average of several tests. “Two chil- 
dren with the same mental age can be entirely different per- 
sons, as is well known. There is nothing wrong about using 
a mental age or an intelligence quotient if it is understood 
as an average of several tests. The error that is frequently 
made is interpreting it as measuring some basic functional 
unity when it is known to be nothing morc than a composite 
of many functional unities. 


The researches on the primary mental abilities which have 
been in progress for several years have had as their first 
purposes the identification and definition of the independent 
factors of mind. As the nature of the abilities became more 





1The studies reported in this paper have been carried out under the joint 
sponsorship of the Chicago Public Schools, the University of Chicago, and the 
American Council on Education. 


105 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


clearly indicated by successive studies, a second purpose of a 
more practical nature has been involved in some of the studies. 
This purpose has been to prepare a set of tests of psycho- 
logical significance and practicable adaptability to the school 
testing and guidance program. The series of studies will be 
summarized in this paper, the battery of tests soon to be 
available will be described, and some of the problems now 
being investigated will be discussed briefly. 


Previous Studies 


The first study in this series involved the use of 56 psy- 
chological examinations that were given to a group of about 
250 college students. That study revealed a number of pri- 
mary abilities, some of which were clearly defined by the 
configuration of test vectors while others were indicated by 
the configuration but less clearly defined. All of these factors 
have been studied in subsequent test batteries in which each 
primary factor has been represented by new tests specially 
designed to feature the primary factors in the purest possible 
form. The object has been to construct tests in which there 
is a heavy saturation of a primary factor and in which other 
factors are minimized. This is the purification of tests by 
reducing their complexity. 


These latter studies of the separate abilities were in each 
case made in the Chicago high schools—one study emphasizing 
the perceptual factor at the Lane Technical High School, 
one study of the inductive factor at the Hyde Park High 
School, an intensive study of the memory factor or factors 
in four high schools, and a study of numerical ability by 
Coombs in six high schools. In each series of tests, one factor 
was represented by a large number of tests, but all factors 
were well represented. In all of these studies the samc pri- 
mary abilities were identified as had been found in the experi- 
ment with college students. These studies led to the publica- 
tion by the American Council on Education of an experimental 
battery of tests for the primary mental abilities, adaptable 
for use with students of high school or college age. 


106 








ee 

















ee 





PRIMARY MENTAL ABILITIES OF CHILDREN 


The identification of the same primary mental abilities 
among high school students as we had previously found among 
college students encouraged us to look for differentiation 
among the abilities of younger children. In the Chicago Pub- 
lic Schools, group mental tests are made of all 1B, 4B, and 
8B children in the elementary schools and of 10B students 
in the high schools. The demand for a series of tests to be 
used in the guidance program for high school entrants and 
the advisability of not making too broad a leap in age led 
us to select an eighth-grade population for the next study. 


The Eighth-Grade Experiment 


In view of the purpose of investigating whether or not 
primary mental abilities could be isolated for children at the 
fourteen-year age level, the construction of the tests consisted 
essentially in the adaptation for the younger children of tests 
previously used with high school students. In some of the 
tests little or no alteration was necessary, while for other 
tests it was considered advisable to revise vocabulary and 
other aspects of the tests to suit the younger age Icvel. A 
number of new tests were added to those selected from pre- 
vious experimental battcries. Sixty tests constituted the final 
battery. 


When the tests had been designed and printed, they were 
given in a trial form to children in grades 7A and 8A in 
several schools. Groups of from 50 to 100 children in these 
two grades were used for the purpose of standardizing pro- 
cedures and, especially, for setting time limits. 


Fifteen Chicago elementary schools were selected by Miss 
Minnie L. Fallon, Assistant Superintendent in charge of ele- 
mentary education, and by Dr. Grace E. Munson, Director 
of the Bureau of Child Study, as experimental schools for this 
study. The tests in the main investigation were administered 
in the schools by the adjustment teachers These adjustment 
teachers had had special training in testing procedures with 


the Bureau of Child Study and also had had considerable 
107 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


experience in giving psychological and educational tests. Spe- 
cial instructions in the procedures for these tests were given 
to the adjustment teachers, as well as written instructions for 
each day’s testing program. 


Eleven hundred and fifty-four children participated in this 
study. The complete battery of 60 tests was given in 11 one- 
hour sessions to the children in the 8B grades in each of the 
15 schools. The children enjoyed the tests and, with very few 
exceptions, the sustained interest and effort were quite evident. 
One thing which a psychologist might fear in such a long 
series of tests would be fluctuating movitation on the part of 
the students. Although the adjustment teachers administered 
the tests, every session was observed by a member of our 
staff, and we were highly gratified by the sustained interest and 
effort of the pupils. 


In addition to the 60 tests we used three more variables: 
chronological age. mental age, and sex. The latter test data 
were available in school records. They were determined by 
the Kuhlmann-Anderson tests which had been given previously 
to the same children. Therefore, the battery to be analyzed 
factorially contained 63 variables. 


The total population in this study consisted of 1,154 
eighth-grade children. When all the records had been as- 
sembled, it was found that 710 of these subjects had complete 
records for all of the 63 variables. We decided to base our 
correlations on this population of complete records rather 
than to use the large population with varying numbers of cases 
for the correlation coefficients. For convenience of handling 
with the tabulating-machine methods, the raw scores werc 
transmuted into single digit scores from which the Pearson 
product-moment correlation coefficients were computed. With 
63 variables there were 1,935 Pearson correlation coefficients. 


This table of intercorrelations was factored to 10 factors 
by the centroid method on the tabulating machines by means 
of punched cards. Successive rotations made by the method 


108 








Sg TT CSU Gene nena 


eee 





ae 

















SERIE ger eR eR 


Te 





Saati anit 








PRIMARY MENTAL ABILITIES OF CHILDREN 


of extended vectors yielded an oblique factorial matrix which 
is a simple structure. 


Inspection of the rotated factorial matrix showed seven 
of the factors previously indicated: Memory, Induction, Ver- 
bal Comprehension, Word Fluency, Number, Space, Percep- 
tual Speed, and three less easily identifiable factors. One 
of these is another Verbal factor; one is involved in ability 
to solve pencil mazes; and one is present in the three dot- 
counting tests which were used. 


We have computed the intercorrelations between the 10 
primary factors. Our main interest centers on the seven 
primary factors that can be given interpretation and, espe- 
cially, on the first six of these factors for which the interpre- 
tation is rather more definite. Among the high correlations 
we note that the Number factor is correlated with the two 
Verbal factors. The Word Fluency factor has high correlation 
with the Verbal Comprehension factor and with Induction. 
The Rote Memory factor seems to be independent of the other 
factors. These correlations are higher than the correlations 
between primary factors for adults. 


Because of the psychological interest in the correlations 
of the primary mental abilities, we have made a separate 
analysis of the correlations for those factors which seem to 
have reasonably certain interpretation. If these six primary 
mental abilities are correlated because of some general intel- 
lective factor, then the rank of the correlation matrix should 
be one. “pon examination, this actually proves to be the case. 
A single factor accounts for most of the correlations between 
the primary factors. 


The single factor loadings show that the inductive factor 
has the highest loading and the Rote Memory factor the lowest 
loading on the common general factor in the primary abilities. 
This general factor is what we have called a second-order 
general factor. It makes its appearance not as a separate fac- 
tor, but as a factor inherent in the primaries and their correla- 


109 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tions. If further studies of the primary mental abilities of 
children should reveal this general factor, it may sustain 
Spearman’s contention that there exists a general intellective 
factor. Instead of depending on the averages or centroids 
of arbitrary test batteries for its determination, the present 
method should enable us to identify it uniquely. 


We have not been able to find in these data a general 
factor that is distinct from the primary factors, but the sec- 
ond-order general factor should be of as much psychological 
interest as the more frequently postulated, independent gen- 
eral factor of Spearman. It would be our judgment that 
the second-order general factor found here is probably the 
general factor which Spearman has so long defended, but we 
cannot say whether he would accept the present findings as 
sustaining his contentions about the general factor. We have 
not found any occasion to debate the existence of a general 
intellective factor. The factorial methods we have been using 
are adequate for finding such a factor, either as a factor inde- 
pendent of the primaries or as a factor operating through 
correlated primaries. We have reported on primary mental 
abilities in adults, which seem to show only low positive cor- 
relations except for the two verbal factors. In the present 
study we have found higher correlations among the primary 
factors for eighth-grade children. It is now an interesting 
question to determine whether the correlations among primary 
abilities of still younger children will reveal, perhaps even 
more strongly, a second-order general factor. 


Interpretation of Factors 


The analysis of this battery of 60 tests revealed essentially 
the same set of primary factors which had been found in 
previous factorial studies. Six of the factors seemed to have 
sufficient stability for the several age levels that have been 
investigated to justify an extension of the tests for these fac- 
tors into practical test work in the schools. In making this 
extension we have been obliged to consider carefully the dif- 
ference between research on the nature of the primary fac- 


110 





Re ST 











LS ST 





SS we 











PRIMARY MENTAL ABILITIES OF CHILDREN 


tors and the construction of tests for practical use. Several 
of the primary factors are not yet sufficiently clear as regards 
psychological interpretation to justify an attempt to appraise 
them generally among school children. The primary factors 
that do seem to be clear enough for such purposes are the 
following: Verbal Comprehension V, Word Fluency W, 
Number N, Space S, Rote Memory M, and Induction or Rea- 
soning R. The factors which in several studies are not yet 
sufficiently clear for general application are the Perceptual 
factor P and the Deductive factor D. 


The Verbal factor V is found in tests involving verbal 
comprehension, for example, tests of vocabulary, opposites 
and synonyms, completion tests, and various reading compre- 
hension tests. 


The Word Fluency factor W is involved whenever the 
subject is asked to think of isolated words at a rapid rate. 
It is for this reason that we have called the factor a Word 
Fluency factor. It can be expected in such tests as anagrams, 
rhyming, and producing words with a given initial letter, 
prefix, or suffix. 


The Space factor S is involved in any task in which the 
subject manipulates an object imaginally in two or three dimen- 
sions. The ability is involved in many mechanical tasks and 
in the understanding of mechanical drawings. Such material 
cannot be used conveniently in testing situations, so we have 
used a large number of tasks which are psychologically similar, 
such as Flags, Cards, and Figures. 


The Number factor N is involved in the ability to do 
numerical calculations rapidly and accurately. It is not depen- 
dent upon the reasoning factors in problem-solving, but seems 
to be restricted to the simpler processes, such as addition and 
multiplication. 


A Memory factor M has been clearly present in all test 
batteries. The tests for memory which are now being used 
depend upon the ability to memorize quickly. It is quite 


111 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 






possible that the Memory factor will be broken down into 
more specific factors. 































The Reasoning factor R is involved in tasks that require E 
the subject to discover a rule or principle covering the mate- 
rial of the test. The Letter Series and Letter Grouping 
tests are good examples of the task. In all these experimental 
studies two separate Reasoning factors have been indicated. 
They are perhaps Induction and Deduction, but we have not 
succeeded in constructing pure tests of either factor. The tests 
which we are now using are more heavily saturated with the 
Inductive factor, but for the present we are simply calling 
the ability R, Reasoning. 





In presenting for general use a differential psychological 
examination which appraises the mental endowment of chil- ’ 
dren, it should not be assumed that there is anything final 
about six primary factors. No one knows how many primary 
mental abilities there may be. It is hoped that future factorial 

— studies will reveal many other important primary abilities so 
that the mental profiles of students may eventually be ade- ) 
quate for appraising educational and vocational potentialities. 
In such a program the present studies are only a starting 
point in substituting for the description of mental endowment 
by a single intelligence index the description of mental endow- 
ment by a profile of fundamental traits. 


The Final Test Battery 


In adapting the tests for practical use in the schools for 
the appraisal of six primary mental abilities, we must recog- 
nize that the new test program has for its object the produc- 
tion of a profile for each child, as distinguished from the 
description of a child’s mental endowment in terms of a single 
intelligence index. For many educational purposes it is still | 
of value to appraise a child’s mental endowment roughly by 
a single measure, but the composite nature of such single 
indices must be recognized. 


112 




















PRIMARY MENTAL ABILITIES OF CHILDREN 


The factorial matrix of the battery of sixty tests was 
inspected to find the three best tests for each of seven primary 
factors. In making the selection of tests for each primary fac- 
tor we considered not only the factorial saturations of the 
tests, which are, of course, the most important consideration, 
but also the availability of parallel forms which may be needed 
in case the tests should come into general use. Ease of ad- 
ministration and ease in understanding of the instructions are 
also important considerations. 


The three tests for each primary factor were printed in a 
separate booklet and the material was so arranged that the 
three tests for any factor could be given easily within a 40- 
minute school period. The main purpose of the larger test 
battery was to determine whether or not the primary factors 
could be found for eighth-grade children, but the purpose of 
the present shorter battery was to produce a practical, useful 
test battery and to check its factorial composition. The se- 
lected tests were edited and revised so that they could be used 
for either hand-scoring or machine-scoring. The Word Flu- 
ency tests constitute an exception in that none of the tests 
now known to be saturated with this factor seems to be suit- 
able for machine-scoring. 


In order to check the factorial analysis at the present age 
level, we arranged to give the selected list of 21 tests to a 
second population of eighth-grade children. The resulting 
data were factored independently of the larger battery of 
tests. There were 437 subjects in this population who took 
all of the 21 tests. This population was used for a new factor 
analysis. The results of this analysis clearly confirmed the 
previous study. The simple structure-in the present battery 
is sharp, with only one primary factor conspicuously present 
in each test, so that the structure could be determined by 
inspection for clusters. 


A battery of 17 tests has been assembled into a series of 
test booklets for use in the Chicago schools. An experimental 
edition of 25,000 copies has been printed, and the plan for 


113 





6 

fi 
ie 
be 


se 


shenans. 


A er ER a HET ec 


a eae 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


securing norms on these tests includes their administration to 
1,000 children at each half-year grade level from grade 5B 
through the senior year in high school. These records have 
been obtained during the school year 1940 to 1941. The use 
of such a wide age range in standardizing the test is at first 
thought, perhaps, rather strange. The effort was made in 
order to secure age norms throughout the entire range of 
abilities found among eighth-grade children since the tests 
are to become a part of the testing procedure for all 8B 
children in the Chicago schools. Separate age norms will be 
derived for each of the six primary abilities. If a single index 
of a student’s mental ability is desired, it is recommended 
that the average of his six ability scores be used. 


As soon as the norms are established, the tests will be 
published by the American Council on Education under the 
title “Chicago Primary Mental Abilities Tests.”” It is expected 
that the tests will be ready for distribution during the sum- 
mer of 1941. The norms provided with the tests will be of 
a wide enough range to make the tests useful at the high 
school and upper grade levels. 


The complete test program consists of 17 tests, all of 
which have been reduced to machine-scoring form except the 
three tests for the Word Fluency factor W. In the nature 
of the case there seem to be difficulties in reducing this test to 
machine-scoring form, and hence it has been retained in hand- 
scoring form. It should be said, however, that the W tests 
can be scored almost as fast, if not as fast, as the tests which 
are machine-scored. Since all of the tests can be hand-scored, 
their use is not limited to schools large enough to avail 
themselves of the scoring machine. The hand-scoring of all 
the tests is very easily accomplished by the use of perforated 
stencils to be provided with the tests. Hand-scoring is facili- 
tated by the the use of the scoring board distributed by the 
Stoelting Company. 


The new battery represents six primary mental abilities, 
namely, Verbal Comprehension V, Space S, Number N, Mem- 


114 
































PRIMARY MENTAL ABILITIES OF CIILDREN 


ory M, Word Fluency W, and Reasoning R. They enable the 
skilled psychologist to tabulate a profile of six linearly inde- 
pendent scores instead of a single measure, such as the 
intelligence quotient. 

Principals, teachers, adjustment teachers, and_ school 
psychologists have expressed their satisfaction with the profile 
of abilities plotted for each child. Probably the children 
themselves have found the profiles most interesting and have 
profited most from an examination of their own profiles. In 
the school year 1941-1942, these tests will be installed as a 
part of the educatiunal guidance program in the Chicago 
schools by administering them regularly to 8B elementary 
school pupils and 10B high school pupils. 

Some of the features of the tests should be mentioned. 
The tests are so arranged that machine-scoring and hand-scor- 
ing tests are directly comparable and will have the same 
norms. The child’s task does not vary with the type of scor- 
ing; only the scorer’s job is changed. Another feature is the 
use of fore-exercise booklets printed on yellow paper. The 
time limits for the practice exercises are approximate. When 
a test proper is started, the student places his white test book- 
let on top of his yellow practice booklet, and the examiners 
and proctors can check at a glance that every child is work- 
ing in the right place. The tests proper are to be timed 
exactly. The three tests of each of the six abilities are 
arranged in a booklet for administration within a 40-minute 
school period. It is recommended that the successive booklets 
be given on successive school days. 


Further Problems 


One of our principal rescarch interests at the present 
time is to determine whether primary abilities can be identi- 
fied in children of kindergarten or first-grade age. A series 
of about 50 tests is well under way, and some of them are 
now being tried with young children. If we succeed in isolat- 
ing primary abilities among these young children, our next 
step will be to prepare a practical battery of tests for that 


115 


oS er 4s en 


=] 


fa here hag 


= 


SF FES 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


age. A subsequent problem will be to make experimental 
studies of paper-and-pencil tests for appraising the primary 
abilities of children in the intermediate grades, approximately 
at the fourth-grade level. We are fairly confident that such 
tests can be prepared for use in the intermediate grades. 

It is a long way in the future, but it is interesting to spec- 
ulate on the possibility of using the tests of the primary 
mental abilities as the tool with which to study fundamental 
psychological problems of mental growth and mental inherit- 
ance. Absolute scaling of the tests at the different age levels 
will make possible studies on the rates of development of the 
separate abilities at various age levels. Modifiability of the 
abilities will be another problem to which we shall later 
turn attention. 


116 














A STATISTICAL EVALUATION OF 
CLINICAL COUNSELING! 


E. G. WILLIAMSON AND E. S. BORDIN 


University of Minnesota 


YSTEMATIC efforts at evaluation are a relatively recent 
development in the field of counseling. The form of 
appraisal has ranged from “verbal research” to simple statis- 
tical analysis. Not all of these studies have avoided the pit- 
falls, of which there are many, to be found in this undertak- 
ing. The assumptions, methods, and weaknesses involved in 
the various evaluation approaches are summarized in a pre- 
vious paper (10). 

The present paper, one of a number to be reported, sum- 
marizes an experiment designed to evaluate a certain type of 
counseling. Since our conclusions are applicable only to coun- 
seling based upon the philosophy and procedures employed 
at the Testing Bureau of the University of Minnesota, this 
type of clinical counseling should be described. 

This clinical counseling has as its purpose assisting the 
student to choose and make progress toward educational and 
vocational objectives which will yield maximum satisfaction. 
It is assumed that this end can be accomplished best by aiding 
him to set his aspirations in terms of the level of his poten- 
tialities. Naturally his potentialities must first be analyzed 
before a diagnosis of any discrepancy between aspiration and 
ability can be made and before assistance can be forthcoming 
from the counselor. : 





1Assistance in the preparation of this material was furnished by the per- 
sonnel of Work Projects Administration, Official Project No. 65-1-71-140, Sub- 
Project No. 93. 


117 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The case data upon which analyses and diagnoses are made 
consist of standardized ability tests, personality and interest 
inventories, questionnaire records, and non-quantified informa. 
tion collected from the student, his associates, and his parents. 
This accumulation of information must be interpreted on the 
basis of an integrated picture of the individual provided 
through personal interviews. In other words the counselor 
deals with a unique individual rather than with a generalized 
conception of a group of individuals. 

The interview provides the medium through which coun- 
seling is personalized and through which the student is assisted 
in making his decisions. While the decisions that the student 
accepts are and should be his own, the counselor sometimes 
plays a persuasive role in that he organizes relevant case data 
to highlight the alternative courses of action from which the 
student chooses. Once the.student has made the choice, the 
counselor has the task of aiding him to orient himself to his 
interests, attitudes, and abilities, and his environment, home 
and family, recreation, and education for the most successful 
achievement of the chosen objective. 

A fuller description of this clinical counseling process has 
been presented elsewhere (9, 11). Only by means of an 
accurate conception of what the counselor is doing and what 
he is trying to do can any evaluation of that counseling be 
meaningful. Moreover, we should not attempt to generalize 
our conclusions to include any other type of counseling than 
the one studied. 

In attempting to evaluate this clinical counseling, we be- 
lieve that a criterion flexible enough to avoid artificial frag- 
mentation of the individual provides the most adequate design 
for experimentation. Essentially such a design involves a 
judgmental comparison of the individual’s adjustment status 
before and after counseling. This method—essentially the 
non-statistical weighting of variables to form a composite 
estimate—was used in a previously reported study (11: chap. 
IX). In this study an estimate of the degree of the student’s 
cooperation was used as a means of control. The control 


118 














el re ra ii ee ee 

















——— © 


ce I TN IE TET Na TE ET I 





EVALUATION OF CLINICAL COUNSELING 


lies in the comparison of those students who did with those 
who did not follow the counselor's recommendations. 


The process of making these evaluative judgments involves 
three phases: (a) the preliminary review or analysis of 
the case data; (b) the follow-up interview; (c) the case 
evaluation. 


In the first phase of the experiment, all student cases were 
independently and critically read by two trained workers whose 
functions were to analyze and record all information con- 
tained in the case folder. Any discrepancies between the anal- 
yses of the two readers were reconciled or adjusted in con- 
ference with the staff members concerned with the project. 
The case reviewers also compiled questions concerning the 
present status of the student, his adjustment to his original 
problem or problems, his adjustment to the counsel given, and 
any other pertinent information. These questions were used 
subsequently in the follow-up interview. 


For the second step all student cases were called in for 
a follow-up interview. Cases which were incomplete because 
of insufficient interview contacts or incomplete test battery 
and which could not be reached for follow-up interviewing 
on the campus were reached either through a questionnaire 
or an interview in the home. For those students who had 
left the University, information was collected, and used, con- 
cerning their adjustment to their jobs and their satisfaction 
with that out-of-school adjustment. The follow-up interview 
yielded information concerning the extent of success or failure 
achieved by the student in solving each of his original prob- 
lems and the extent to which the counselor’s advice had been 
followed subsequent to the original counseling interviews. The 
student’s own statement of the degree of his satisfaction with 
his solution of the problems and with University Testing Bu- 
reau counseling and recommendations or any other interpre- 
tations or evaluations that he made were specifically recorded. 
The interviewer did not interpret or evaluate this informa- 
tion. The purpose of the follow-up interview was, essentially, 
to obtain the factual data on the present status of the case. 


119 





Et 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Trained evaluators next critically reviewed the original 
case data and the follow-up interview report to arrive at a 
judgment of the extent to which the student had adjusted the 
problems for which he had originally sought counseling. The 
effectiveness of the counseling was evaluated in terms of the 
following counseling functions or services: 

1. Diagnosis of the student’s vocational and educational 
possibilities. 

2. Advice in making appropriate choice of a vocational 
field and in securing the related educational training sequences. 

3. Counseling as to recognition of, and alleviation of, 
disturbing factors (emotional, educational, economic, health) 
which may interfere with or prevent the acceptance of proper 
vocational choice and the achievement of appropriate training. 

4. Assistance in the discovery and utilization of personal 
resources in effecting an adjustment. 

5. Guidance in the use of all University personnel re- 
sources in diagnosis and counseling. 

The student’s adjustment with regard to vocational choice 
and his progress toward achieving satisfactory training for 
that choice were judged by means of the following criteria : 

1. Choices made in line with aptitudes, interests, work 
habits, personality, etc. 

2. Program of studies in line with these choices and the 
student’s qualifications. 

3. The student’s satisfaction with vocational choice. 

4. Progress in achieving training for objectives in terms 
of the capacity of the student to profit from such training. 

5. Alleviation of factors which interfered with the mak- 
ing of a satisfactory vocational choice and with acquiring 
the necessary training, e. g., parental dominance of choice, 
inadequate study skills, etc. 

In making their appraisals the evaluators studied the stu- 
dent’s interests and aptitudes, the counselor’s interview 
notes, the student’s reported comments, and his grade record 
achieved before and after counseling. All of the information 
was weighed and balanced with reference to the five criteria 


120 





EES, 


TS et ETE ee 


meee 


OT 





® Me, 7 








ey TOTS RTS eT IL 


ea 














EVALUATION OF CLINICAL COUNSELING 


before a judgment was made of the degree of adjustment 
achieved by the student subsequent to counseling. The degree 
of cooperation was independently judged in the same manner. 

The following five categories? formed the scale of 
adjustment: 


Satisfactory Adjustment—1. The student is satisfied with 
his vocational adjustment at the time of the follow-up inter- 
view. In some cases the student’s dissatisfaction will not be 
a deterrent to a rating of satisfactory adjustment. In instances 
where the student’s aspirations are far above his level of abil- 
ity, he is considered satisfactorily adjusted if he accepts the 
fact that his ambitions must be pitched at a lower level. 

2. In the interviewer’s judgment the vocational choice 
and adjustment of the student are adequate, based upon 
aptitudes, interests, and subjective factors revealed through 
interviews. 

3. There has been an alleviation of distracting factors 
which interfere with vocational choice and professional train- 
ing such as inadequate socialization, mental conflicts, financial 
problems, health handicaps, and any other problems. 

4. Achievement in a given training program is commensu- 
rate with aptitudes and interests. 

Some Progress Toward Adjustment—The student has 
not yet reached a satisfactory adjustment, according to the 
previously stated criteria, but is evidently started on the road 
and may eventually reach the desired objective. He may have 
come to the counselor with a number of problems involving 
vocational choice, classification in college classes, and social 
adjustment and personal peculiarities. In the follow-up inter- 
view it may be found, for example, that he has succeeded in 
settling his vocational question but that he is still struggling 
for mastery in regard to social adjustments. 

No Change—This classification is used for those cases in 
which the problems remain the same as at the time of counsel- 
ing. While the passage of time will usually make a problem 
more serious, the designation of “slightly worse” was not ap- 





*Described and illustrated in an earlier study. See reference 11: chap. 1X. 


121 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


plied unless a choice point had actually been passed. Thus 
a sophomore who has not yet made a vocational choice would 
be classed as unchanged, but a junior without a vocational 
choice would be in a more serious position and therefore would 
be classified as slightly worse. Juniors should have begun 
specialization if they are to make “normal” progress toward 
graduation. 

Slightly Worse—This is a condition in which the solution 
of the original problems seems slightly more remote and the 
factors which‘ existed at the time of the first counseling contact 
still exist and are accentuated. 

Much W orse—Those cases where the student’s problems 
are more severe and the solution much more remote or less 
probable of achievement. 

The judgments of the degree of cooperation were based 
upon the following categories: 


Followed advice wholly—The student followed the coun- 
selor’s advice with respect to the most dominant or important 
original problems. 

Followed advice in part—The student either partially fol- 
lowed the counselor’s advice with respect to the chief problems 
or completely followed advice with respect to some but did 
not follow advice with respect to others. 


Did not follow advice—The student did not follow the 
counselor’s advice in regard to any of the main problems. 

In order to determine the reliability of classification of cases 
according to the foregoing two sets of categories, an ‘‘outside”’ 
judge was called in to make independent judgments of the 
adjustment of a random sample of 247 cases. A coefficient 
of correlation of .82 was found between the “outside” judge’s 
classifications and those made by the evaluators. This cocfh- 
cient may be interpreted as a high index of validity or as a fair 
index of reliability, according to the reader’s own conception 
of the meaning of these two terms. In over half of the cases 
where a discrepancy occurred, the “outside” judge had made 
a higher classification than had the two original judges. This 


122 









: 
; 
| 
| 





gy, STE ee 

















































EVALUATION OF CLINICAL COUNSELING 


would seem to indicate that the evaluators had not over-esti- 
mated the effectiveness of the counseling. 

The question arises as to how much influence the student’s 
subsequent academic achievement (available to the evalu- 
ators) had on the judge’s estimate of adjustment and coopera- 
tion. The correlations between honor-point ratio achieved 
after counseling and judgment of adjustment were .23 and 
.39 respectively, for General College and the College of Sci- 
ence, Literature and the Arts.* The difference between these 
coefficients, of borderline significance (D,/S.E. p= 2.0*), 

' 


may indicate that academic adjustment is more closely related 
to judgment of total adjustment for SLA students than for 
General College students. This does not seem unreasonable, 
since SLA students are generally committed to careers in which 
academic achievement is one of the most immediate requisites 
for success. The correlations of honor-point ratio with judg- 
ment of cooperation were of negligible magnitude: .16 + .05 
for General College students and .17 + .03 for SLA students. 

In all, data were collected on 987 complete student cases 
who used University Testing Bureau services during the years 
1933-34-35. For the purposes of this study it was deemed 
desirable to analyze as homogeneous a population as possible 
: without the sacrifice of too much data. For this reason 498 
students from SLA and 195 students from the General College 
were selected. Classified according to their status at the time 
of counseling, in the SLA group were 154 pre-college cases, 
176 freshmen, and 168 sophomores. The General College 
group contained 41 pre-college cases, 125 freshmen, and 29 
sophomores. The pre-college cases were high school seniors 
or recent graduates who came to the Bureau for counseling 


SMO IPP yg RR ET ER oe TR 











in the spring and summer immediately preceding enrollment 

in the University. 
That the groups chosen for study were a fairly satisfactory 
representation of the total range of ability and achievement 


in the undergraduate classes of these colleges can be demon- 





“Hereafter designated as SLA. 
‘Computed according to Fisher’s method (1: pp. 208-10). 


123 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


strated from the distributions of aptitude test and high school 
percentile scores. Of the SLA students, 62.6 per cent and 78.1 
per cent fell at or above the fiftieth percentile in aptitude and 
high school percentile scores, respectively, while 82.8 per cent 
and 65.2 per cent of the General College students fell at or 
below the fiftieth percentile in the same variables. Too often 
there is the tendency to assume that only low ability students 
have a desire for or need of counseling. The distributions of 
the SLA group would seem to refute this assumption. The 
General College population provides us with the opportunity 
to determine whether counseling can be equally effective with 
low ability students. 


The representativeness of our groups in terms of SLA 
and General College freshmen was determined by a compari- 
son of high-school average grades transmuted into percentiles. 
Unfortunately, statistics on sophomores in these colleges were 
not available. Because of the known elimination of freshmen 
with lower percentiles, the sophomore population should be 
higher on the average. Since our experimental population 
consisted of students from both classes, this analysis of repre- 
sentativeness is not precise. The freshmen in our group were 
compared with representative SLA and General College 
samples. For SLA the combined mean for a sample of 2,157 
freshmen students of the fall classes of 1933-34-35 was 65.45 
as compared with 69.59 for our experimental freshman group. 
Although this small difference of 4.14 is reliable (C. R. of 
3.39), it does not represent a very significant one as far as 
the purpose of this study is concerned. The comparison of the 
means of the General College groups yields similar results. 
The combined mean of representative freshmen of the 1933 
and 1935 freshman classes was 34.00°; for our group it was 
40.25. Although the difference is somewhat larger than that 
in the SLA group, it is not so reliable (C. R. of 2.64). 


We may conclude from these two analyses of the nature 
of our counseled groups that they were generally representa- 





5From unpublished data collected by Dr. Ruth Eckert of the University of 
Minnesota. 


124 









: 
: 


a i 


EEL PEL 


TET = 





ED RE age STE, 











oo RE ae 











EVALUATION OF CLINICAL COUNSELING 


tive of their total populations and of the total range of ability 
to do college work. It is interesting to note that the students 
who are counseled by the Testing Bureau, contrary to the 
opinions of many, are not the students of inferior ability, but, 
if anything, are slightly superior to the general undergraduate 
population of these two colleges. 


Results of the Experiment 

Degree of Cooperation and Adjustment—Previous studies 
of the effectiveness of counseling which used similar methods 
have reported results in terms of percentages. The propor- 
tions of our groups who were classified as satisfactorily or 
partially adjusted (82.8 per cent of SLA and 86.2 per cent of 
General College) compare favorably with those reported 
in English studies. Oakley (3) and Macrae (2), working 
with small populations of younger students, reported 95 per 
cent and 55 per cent, respectively, as the proportions who 
followed advice and who were satisfied and successful in their 
occupational adjustment. Rodger (4), with a larger popula- 
tion, reports 79 per cent successful adjustment. Seipp (5), 
using a methodology almost identical with ours, analyzed the 
case records of 100 adults diagnosed and advised by the 
Adjustment Service of New York. She found that 57 per cent 
made a satisfactory adjustment subsequent to counseling. Our 
results are even more impressive when analyzed in terms of 
those who cooperated in following the counselor’s advice. In 
these terms the percentage of the SLA students satisfactorily 
or partially adjusted is 93.5 and the percentage of General 
College students is 96.3. 

Our data also indicate that the counseling was equally effec- 
tive, if not more so, in gaining the cooperation of the student. 
For SLA 70.9 per cent cooperated wholly and 20.1 per cent 
partly, while for General College the percentages were 69.7 
who cooperated wholly, and 24.1 partly. Viteles (7), diag- 
nosing and advising 75 adolescents, found that 58 per cent 
followed advice completely and 21 per cent partly. 

Since the SLA and General College groups differed so 
markedly in college aptitude, it was interesting to determine 


125 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


whether there was any real difference between the adjustment 
classifications of the two groups of experimental cases. To 
test this hypothesis, the chi-square test of independence 
was used.° The result (chi-square value of 3.48, p> .05) 
indicates that there was no difference in the adjustment 


achieved by the two groups. A similar analysis of the coop- 


eration classifications yields a similar result (chi-square value 
of 2.51, p > .05). A further analysis involving the length of 
the student’s attendance in the University was generally unre- 
lated to either adjustment or cooperation. The chi-square 
values for the groups were insignificant in value. 

Adjustment versus Degree of Cooperation—In addition to 
these direct analyses, we have attempted to shed light on the 
definition of the conditions which make adjustment more 
probable as an outcome of clinical counseling. First and fore- 
most of these conditions is that the student cooperate with 
his counselor. Anyone who has had intimate experience in 
counseling will have observed that the cultivation of a cooper- 
ative attitude usually precedes effective counseling. That 
the greater proportion of adjusted students found among 
those students who cooperated is not accidental is clearly indi- 
cated by the test results of the independence of these two 
variables. The chi-square values of 115.62 and 47.44 for 
SLA and General College students are both highly significant 
(p < .01). This means that we may assume that a student 
who cooperates with the counselor in attempting a solution 

‘This statistic may be used to test the independence of distributions from a 
real or hypothetical distribution (6: chap. 1). As used in this study, the 
expected distributions were based upon the proportions of the five classes of 


adjustment observed in the total distribution. The formula for computation 
appropriate for this type of analysis is: 


Ej anda 
Hes4 
n 


yielding a value which, by use of a table of chi-square distributions, is translated 
into an estimate of the probability that such a value could have been obtained 
for additional samples drawn from the same general population. We shall use 
the conventional five per cent and one per cent points as our confidence limits. 
These points are equivalent to values two and three standard deviations from 
the mean. Because of the small number of cases, some of the categories were 
combined in all of the chi-square tests used. Five was the smailest number of 
cases permitted in any one category. 


126 








SO gh ne re 


LET TT ag AIS EL SIR TTT. 




















Og ES a Te, PTET ge T= 


I TT a LEE OT” 

















EVALUATION OF CLINICAL COUNSELING 


of problems will in all probability achieve satisfactory adjust- 
ment as defined above. Only the General College sophomores 
did not exhibit a statistically significant relationship. The 
restriction of the range of the variables necessitated by the 
small number of cases in this group may explain this fact. 

Expectancy of Adjustment According to Type of Problem 
—A previous study by Williamson (8) has shown that coun- 
selors tend to specialize in the types of problems that they 
treat. It is important, therefore, to determine the effective- 
ness of counseling with respect to different types of problems. 
Since in most cases students experience more than one prob- 
lem, classification in any one category, e.g., vocational 
problem, will include students who may also have an educa- 
tional problem, a social problem, or any other problem or 
combination of problems. In view of this fact, if the voca- 
tional category shows a significantly greater proportion of 
adjusted students than the educational or the emotional 
category, then evidence of the differential effectiveness of the 
counseling will have been discovered. 

An analysis of our data gives a clear indication that the 
adjustment expectancy is not so marked for social-personal- 
emotional problems as for vocational and educational types 
of problems. While not all of the differences are statistically 
significant — the total General College group and the Gen- 
eral College freshmen showed the significant ones (chi-square, 
29.84; p< .O1 and chi-square, 32.99; p < .01)—the trends 
are consistently in the same direction. Since it has already 
been shown that cooperation can be assumed as a counseling 
condition necessary to adjustment, it is not surprising to find 
that there is a greater expectancy of cooperation for voca- 
tional and educational problems than for social-personal- 
emotional ones. 

Expectancy of Adjustment According to Status of Voca- 
tional Choice—Since the counseling being evaluated in this 
experiment is primarily educational and vocational, the types 
of changes in vocational orientation required should be of 
importance for the expectancy of adjustment. Four possibili- 


127 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ties were defined: (a) confirmation of the student’s choice 
by the counselor; (b) recommendation by the counselor of 
some choice other than the student’s; (c) recommendation of 
a choice by the counselor, because of the student’s indecision 
at the time of the original contact; (d) deferment of choice 
on the counselor’s advice at the time of original contact. It 
had generally been assumed that the counselor is more likely 
to bring about adjustment when he has only to confirm the 
student’s previous choice. The results of our experiment do 
not support this assumption. They indicate that it makes no 
difference, for this type of counseling, whether the student’s 
choice was confirmed or changed or whether he was undecided 
at the time of the first interview. But in those cases where 
choice is deferred, the expectancy of adjustment is significantly 
less (chi-square of 28.59, p < .01). In the case of coopera- 
tion, what ‘“‘ought to be true’’ actually is true. As one would 
suppose, greater cooperation is to be expected from those 
students whose vocational choices were confirmed (chi-square 
of 15.7, p < .01). 

Aptitude and Achievement in Relation to Adjustment and 
Cooperation—One might expect that ability and previous 
achievement of students who come for counseling would be 
positively related to expectancy of cooperation and adjust- 
ment. This problem was attacked by studying the aptitude 
and achievement characteristics of students in each of the co- 
operation and adjustment categories. The analysis of the 
variance in aptitude test scores gives evidence that this assump- 
tion cannot be held in terms of the ability test used.” The 
variance ratios were of such a small degree that the probabili- 
ties that they represented the same population were greater 
than five in a hundred. This means that low ability students 
are just as likely to be cooperative and adjusted as high 
ability students. 

On the other hand, the analysis gives reliable evidence 
that high school achievement is positively related to coopera- 





TSnedecor’s tables of F (6: p. 174) were used to estimate the probabilities 
of getting as large a variance ratio from samples of a homogeneous population. 
Here again the five per cent and one per cent points were taken as the limits 
of confidence. 


128 

































EVALUATION OF CLINICAL COUNSELING 


tion and adjustment (General College, F is 8.09, p< .01; 
SLA, F is 5.45, p < .01). This relation is further emphasized 
by the finding that for any degree of adjustment, those stu- 
dents who cooperated were, for the most part, those with 
previously higher achievement. Previous college achievement 
could be analyzed validly only in relation to cooperation, 
since it already had entered into the estimate of adjustment. 
The results here are not conclusive. While the SLA data 
yielded a significant variance ratio (8.22, p < .01), the Gen- 
eral College data did not. 

Number of Interviews versus Adjustment and Coopera- 
tion—With respect to SLA students, variation in number of 
interviews indicates that the counselor had the most interviews 
with students who were partially adjusted (General College, 
F is 4.13, p < .05; SLA, F is 20.84, p < .01). Thus those 
students who are satisfactorily adjusted, characteristically, 
do not require so many interviews. Likewise, those students 
whose maladjustment is of such a nature (e.g., very low ability) 
as to offer little probability of adjustment are not interviewed 
so frequently. It seems, then, that those students who present 
dificult problems but give promise of adjustment seek counsel- 
ing interviews most frequently. For SLA there is slight 
evidence for a positive relationship of number of interviews 
with judgment of degree of cooperation (F is 3.07, p < .05). 
In the case of General College students, there is a negative 
relationship between adjustment and the number of interviews. 
The students in the low adjustment categories apparently 
show a greater willingness or are more encouraged to return 
for further counseling than are the more satisfactory adjust- 
ment groups. sl 

Time Interval versus Evaluation—The significance of the 
time elapsed between the first counseling interview and the 
follow-up interview should be of value in indicating the 
optimum period for an evaluation experiment. Since there is 
no reason to suppose that special selection has operated in the 
selection of the time at which the adjustment groups were 
studied, it is reasonable to infer that observed differences are 


129 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


differences in time necessary for reaching that level of adjust- 
ment. While the previous analysis of SLA data indicated a 
greater number of interviews for the students classified in the 
partially adjusted category, it is evident that students in this 
category required the shortest time to achieve their degree of 
adjustment. Analysis of the General College data supports this 
result, a perplexing one. However, a more interpretable result 
is secured when the data are analyzed in terms of the interre- 
lation of cooperation and adjustment. The trend is in the 
direction of a shorter time interval for students in any degree 
of adjustment who cooperated to a greater degree with the 
counselor. The inference may be made that those students 
who cooperated reached a given level of adjustment in a 
shorter time. The difference averages a little over two months 
in an average evaluation period of 16 months. The F value 
of 3.4 is beyond the one per cent point. 


Summary 


This evaluation of the clinical counseling practiced in the 
Testing Bureau of the University of Minnesota has attacked 
two basic problems: (a) What proportions of students were 
aided by the Bureau’s counseling to achieve a better adjust- 
ment? (b) What conditions and characteristics of counseled 
students are most conducive to a favorable prognosis of sub- 
sequent counseling ? 


In answer to the first question, counseling was effective in 
achieving the cooperation of and in improving the adjustment 
of over 80 per cent of the students in our groups. This is 
especially significant in that the analysis and classification of 
cases were carefully defined and controlled, having been made 
by judges who had not been involved in any of the counseling. 

The conditions and characteristics favorable to adjustment 
include the following: 

1. Cooperation with the counselor was positively related 
to adjustment and those students who cooperated reached their 
level of adjustment in a shorter period of time than those who 


did not. 
130 

































EVALUATION OF CLINICAL COUNSELING 


2. Students experiencing educational and vocational prob- 
lems were more successfully counseled than were those with 
dominant social-personal-emotional problems. 

3. Contrary to belief, our data indicate no differences in 
adjustment among counseling cases classified as vocational 
choice confirmed, altered, or undecided at the first contact. 
But, if vocational choice is deferred by the counselor, the prog- 
nosis of adjustment is less favorable. 

4. Higher high school or previous college achievement is 
positively related to cooperation and adjustment. But level 
of ability, as measured by the aptitude test used in this experi- 
ment, is not related. 

These conclusions may be interpreted as limitations either 
of the students involved or of this type of counseling. In the 
case of the type of problem, it is likely that a limitation of 
counseling is disclosed. Counseling that is educationally and 
vocationally oriented is not likely to deal so effectively with 
social-personal-emotional problems. On the other hand, it 
does not seem probable that any type of counseling or improve- 
ment in treatment techniques can do much for a student with 
a very low achievement background insofar as the types of 
adjustment involved in this evaluation experiment are 
concerned. 

Certain relations of an ambiguous nature and therefore 
demanding further study were observed. There was evidence 
that the counselor conducts more interviews with students who 
are judged as partially adjusted; yet this same group reached 
their level of adjustment within a shorter period of time. 
Our data do not indicate whether or not the counselor tends 
to intensify his work with certain students by conducting many 
interviews within a short period of time. 

There is one conclusion that this study should have made 
clear. The evaluation of counseling is not a casual process, 
easily carried out. Indeed, such a study represents a combina- 
tion of careful and rigorous case reading, many days and weeks 
of interviewing, prolonged clerical and statistical labor, and 
above all a period of patient waiting for the counseling cases 


131 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 





te mature to the stage wherein adequate data are available 
for critical evaluation. 


10. 


II. 





REFERENCES 


. Fisher, R. A. Statistical Methods for Research Workers (7th 


ed.). London: Oliver and Boyd, 1938. 356 pages. 


MacRae, A. “A Follow-up of Vocationally Advised Cases,” 
Journal of the National Institute of Industrial Psychology, 
V (1931), 242-47. 


Oakley, C. A. “A First Follow-up of Scottish Vocationally 
Advised Cases,” Human Factor (London), XI (1937), 27-31. 


Rodger, T. A. “A Follow-up of Vocationally Advised Cases,” 
Human Factor (London), XI (1937), 16-26. 


Seipp, Emma. 4 Study of One Hundred Clients of the Ad- 
justment Service (Adjustment Service Series, Report XI). 
Nerv York: American Association for Adult Education, 1935. 
30 pages. 


Snedecor, G. W. Statistical Methods. Ames: Collegiate Press, 
Inc., 1937. 341 pages. 


Viteles, M. S. “Validating the Clinical Method in Vocational 
Guidance,” Psychological Clinic, XVIII (1929), 69-77. 


Williamson, E. G. “Faculty Counseling at Minnesota. An 
Evaluation Study of Social Case Work Methods,” Occupa- 
tions, XIV (1936), 426-33. 


Williamson, E. G. How to Counsel Students. New York: 
McGraw-Hill, 1939. 561 pages. 


Williamson, E. G. and Bordin, E. S. “Evaluation of Voca- 
tional and Educational Counseling: A Critique of the Meth- 
odology of Experiments,” Educational and Psychological 
Measurement, 1 (1941), 5-24. 


Williamson, E.G. and Darley, J. G. Student Personnel W ork. 
New York: McGraw-Hill, 1937. 313 pages. 

















CONTRIBUTION OF TESTS TO RESEARCH IN THE 
FIELD OF STUDENT PERSONNEL WORK 


RALPH W. TYLER 
University of Chicago 


HE USE of tests is fundamental to many aspects of 

student personnel work. In the selection of students, in 
identifying their potentialities, their problems, and their diffi- 
culties, in checking on the effectiveness of procedures used 
in providing for personal development, in vocational place- 
ment and follow-up, personnel workers have learned to use 
a wide range of tests and to depend on the results of tests 
as a basic part of the personnel program. Although the place 
of tests in the practice of personnel work has been well out- 
lined, the contribution to be expected from tests in connection 
with personnel research has not been so clearly indicated. I 
am differentiating research from practice in the field of student 
personnel work by defining research as the process by which 
basic facts, theories, principles, instruments, and procedures 
are developed, thus providing a rational framework upon 
which the practice of student personnel work can be under- 
stood and elaborated. This distinction may perhaps be made 
clearer by illustration. 


It is a common practice of the student personnel officer 
to administer reading tests to incoming freshmen, to study the 
results, to identify certain students who received relatively 
low scores on the reading tests, and to recommend a remedial 
program in reading for some of these students. Research 
which finds out what reading demands are likely to be made 
by the various freshmen courses, which devises valid instru- 


133 





rt a 5 ee ne ees ye ce 


SS i a ta teh mg ny 


Fp ert ar 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ments for measuring these reading abilities, which estimates 
the probable frequency of inadequate reading abilities among 
freshmen, which develops theory and principles regarding the 
relation of reading development to other aspects of the 
student’s development, and which establishes the probable 
validity of various types of remedial reading procedures 
would represent the essential framework upon which improved 
personnel practices relating to reading can be built. Practice 
and research are complements in a sound professional growth. 


Because student personnel work may be concerned with 
all aspects of the personal and social development of students, 
its problems relate to many previously organized fields of 
research such as physiology, psychology, sociology, anthro- 
pology, psychiatry, and education. Obviously, personnel work- 
ers have drawn and must draw upon these various organized 
fields of research for many of their concepts, instruments, and 
practices. However, many problems which the student per- 
sonnel worker faces cut across two or more of these fields 
and are likely to involve research aspects not adequately 
investigated by any one of these disciplines alone. The prob- 
lems which do involve two or more organized disciplines are 
the problems which in general must be attacked by research 
workers in the field of student personnel. May I indicate 
some of these problems and suggest contributions which tests 
have made or can make to research on these problems? 

One major research problem is to delineate clearly desir- 
able goals for a student personnel program. Accepting the 
general function of personnel work to be the facilitation of 
well-rounded personal and social development of students, it 
is evident that this function must be defined more clearly ir 
the case of a given college or type of college so as to indicate 
the aspects of development to be promoted and the desirec 
relation among these various aspects. This clearer picture o! 
the phases of student development to be given attention anc 
their relation is essential to the intelligent direction of a pro 
gram aimed at facilitating well-rounded development of thi 
individual student. 


134 





Fy R= 
see . 


IN 


— _ 


SET RBS 














Ra 


| 














TESTS IN STUDENT PERSONNEL WORK 


It is obvious that a profession should have its goals clearly 
and definitely in mind; it is not so obvious that the formula- 
tion of these goals for the field of student personnel is a 
research problem of considerable magnitude. The difficulty 
of the task is partly due to the complexity of human develop- 
ment. Well-rounded personal and social development includes 
physiological, psychological, and social aspects. Furthermore, 
these various aspects are interrelated, that is to say, physiolog- 
ical development influences and is influenced by psychological 
and social development. Correspondingly, psychological 
development influences and is influenced by physiological and 
social development; and social development influences and is 
influenced by physiological and psychological development. 
Hence, although research in the several established disciplines 
helps to identify characteristics of normal physiological 
growth, of psychological maturation, and of social develop- 
ment, special research of a co-ordinated or integrated nature 
is necessary to establish the desirable balance among these 
several aspects of student development. 


A second factor which complicates the formulation of goals 
for this field is the relation of student personnel work to the 
rest of the college program. In order that a college have the 
most effective influence upon its students the various phases 
of the college program, curricular and extracurricular, need 
to have some underlying coherence, that is, they must be bound 
together by common purposes. The major purposes of a 
college are educational, and the acceptable goals of student 
personnel services also should be at least in harmony with the 
educational purposes of the institution, and preferably they 
should serve to promote these educational purposes. In the 
actual practice of student personnel work there is danger that 
we shall carry on activities day after day without carefully 
considering their relation to the primary aims of our institu- 
tion. This may lead to a short sighted program in which 
immediate goals are attained without really promoting the 
ultimate goals of the college. It is possible, for example, to 
work out a plan of housing which provides for very quick 


135 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


adjustments of the students to their classmates, and yet by the 
nature of the housing plan, cliques may be encouraged and 
the fundamental educational objective of learning to under- 
stand people with very different backgrounds and to enter 
sympathetically into the lives of persons very different from 
ourselves may be hindered rather than promoted. Or, a social 
counselor may feel that her job is well done when she has 
helped to increase the proportion of women students who have 
regular dates, whereas the ultimate educational objective is 
to get a broader understanding of human behavior including 
a sympathetic understanding of and adjustment to the opposite 
sex. Continued dates with the same individuals in many cases 
may retard the attainment of this objective rather than help 
it. It seems necessary, therefore, to formulate goals for 
student personnel work in such a way that they are closely 
related to the major educational objectives of the institution. 


This implies that the student personnel worker in close 
collaboration with other members of the school or college 
staff will need to examine the various types of studies which 
suggest possible goals of student development. They will need 
to consider the investigations of the sociologist, the social 
anthropologist, the social psychologist, the economist, and the 
political scientist, to identify the demands which our culture 
makes upon young people and to understand the effect of 
cultural pressures upon the individual and his group. These 
studies of the social scientists represent an important com- 
ponent from which goals for student personnel work will be 
formulated. 

But an examination of results of research in the social 
sciences is not enough. It is also necessary to examine studies 
of student health and investigations in the fields of physiology, 
nutrition, and psychiatry—for these help to clarify the concept 
of desirable biological development and also to indicate possi- 
ble deficiencies which students may be helped to overcome. 

A third component of research regarding goals for student 
personnel work is the field of values. Values need to be con- 
sidered carefully not only as possible student goals but also 


136 








Ne P| TTT TTT ee eee Me 











LOE PO I TO TT 


— 


al 








TESTS IN STUDENT PERSONNEL WORK 


because values, individual and cultural, condition the student’s 
development in many ways. The ideals which young people 
absorb from contact with the culture have a more potent 
influence upon student goals, student activities, and the satis- 
factions and disappointments of college life than is commonly 
realized. Any comprehensive formulation of goals for student 
personnel work needs to consider the values which the school 
or college may be expected to promote and the way in which 
school or college experiences may influence these values. 


I have suggested several of the strands which need to be 
considered in delineating goals for student personnel work. It 
is obvious that the selection of goals to be given particular 
emphasis in a particular college depends upon several factors. 
One is the college’s conception of the good life and its 
counterpart—the desirable person. This conception will rep- 
resent not only specific items such as physical health, social 
concern, personal integrity, and the like, but it will also involve 
some idea of the relation of these various aspects. At this 
point it is very necessary for the student personnel worker to 
have a workable but comprehensive theory of personality 
structure and function, and of personality development. 
Because we do know that the human organism shows a con- 
siderable degree of unity in its reactions, because we do know 
that physical, social, and psychological aspects are interrelated, 
we realize that one cannot treat each aspect of a student’s 
development in isolation from the others. Some theory as to 
how these aspects are related, how they function together, 
and how they may be developed together is essential to pro- 
vide a rational basis for personnel work. If the student per- 
sonnel worker together with other members of the school or 
college staff has identified more specifically the aspects of 
human development which the school or college seeks to pro- 
mote, and if he has a comprehensive theory of personality 
development, it is possible to formulate clear yet comprehen- 
sive goals for his own work and to avoid treating a student 
as though he were a mechanical collection of specific reactions. 

What contributions have tests made or can tests make to 


137 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


this area of research? A test provides a controlled situation 
in which certain specified types of behavior may be studied 
and certain phases of this behavior may be measured. In the 
effort to formulate a coherent theory of personal and social 
development, various types of tests must be constructed so 
that students can react in ways which involve the relation of 
biological, social, and psychological phases of behavior. 
These tests may enable us to see more clearly how these 
phases of behavior are related. Furthermore, any college, 
after determining the tentative goals of its student personnel 
work, can employ tests to determine which of these goals are 
of primary importance to its students. Conscious attention 
need not be given in the college program to those points at 
which students are already developing satisfactorily. That is 
to say, tests contribute to this area of research both in develop- 
ing a comprehensive set of goals and in identifying the goals 
which need major attention in a particular college at a 
particular time. 


A second area of research in the field of student personnel 
work is the testing of the fundamental bases upon which a 
student personnel program is built. A well-rounded plan of 
personnel services is a recent addition to the college campus. 
Most of the schemes have been based upon assumptions which 
have not been adequately tested. The principles of organiza- 
tion, of administration, of the selection of the staff, of the 
training of the faculty—all are in need of careful verification. 
These principles seem to the administration or faculty of the 
given institution to be sound, but in many cases they have been 
drawn from fields and experiences which are not strictly 
parallel to the field of student personnel, and it is likely that 
some of these principles are not appropriate as part of the 
foundation of the program of student personnel services. 
Research provides a check on the validity of the basic founda- 
tion of the personnel program. Such research involves com- 
prehensive evaluation of an entire personnel program or of 
particular procedures. 


138 

















=“ 























TESTS IN STUDENT PERSONNEL WORK 


A comprehensive evaluation provides evidence showing 
how far each of the important objectives or goals of student 
personnel work is being attained. Since these goals involve 
various aspects of student development, tests of various sorts 
are essential in order to find out the points at which students 
are developing adequately or the points at which development 
is unsatisfactory. For example, this research requires tests 
of physical development, of health, of personal-social adjust- 
ment, of attitudes, of interests, of skills, of information 
acquired, and the like. It also involves a periodic program 
of testing so as to estimate the progress being made by the 
students, and correspondingly, their rate and degree of 
development. Furthermore, an adequate research program 
provides a follow-up of students after they have been grad- 
uated from college and have gone out into life. These follow- 
ups should probably be made from five to ten years after 
graduation and should include the collection of data regarding 
those objectives which have most permanent significance. This 
probably would include evidence regarding intellectual inter- 
ests, health practices and attitudes, marital adjustment, social- 
civic interests and activities, and maturity of aesthetic inter- 
ests. Such a follow-up study provides an important type of 
data regarding the continuing development of students and, 
therefore, it is a significant phase of the evaluation of the per- 
sonnel program. 


The checking of the fundamental bases upon which the 
student personnel program is built is an area of research 
which has largely been dependent upon valid tests. The recent 
accelerated development of a wider range of tests has been 
accompanied by a corresponding increase in evaluative studies. 
Tests are making an important contribution to this area of 
research. 


A third area of research in the field of student personnel 
work which involves tests is the construction and validation 
of instruments to facilitate the personnel program. Tests 
represent the major group of these instruments. Various tests 
have been constructed for use in selecting students likely to 


139 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


benefit from a given college program. Much research is still 
needed in identifying important characteristics of young people 
which can be used as a basis for college selection and for plan- 
ning programs of educational and vocational guidance. Thus 
far, these tests have largely consisted of measures of verbal 
facility, of numerical manipulation, and of the acquisition of 
information. Tests of higher intellectual skills, of interests, of 
personal-social adjustment, and of attitudes are just beginning 
to contribute markedly to the selection and guidance work. 

Tests already developed have greatly facilitated the iden- 
tification of students needing special attention, but many new 
instruments are also needed. Tests are widely available for 
identifying certain types of reading difficulty and certain types 
of subject-matter deficiency. New instruments are needed, how- 
ever, to measure other types of psychological and social reac- 
tion which have fundamental significance for success in college 
and in life, and which should be identified early enough so that 
a program to facilitate development may be begun. 

A similar condition exists with regard to tests useful in 
the vocational placement of students. Tests of some of the 
essential vocational skills have been of great value. Tests for 
identifying certain vocational interests are showing promisc. 
However, some of the fundamental vocational attitudes, 
habits, and ways of thinking have not been clearly identified, 
nor have satisfactory tests for them yet been developed. The 
future contributions of tests of this type are likely to be large. 

New tests are being constructed to help in evaluating per- 
sonnel programs and procedures. Judgments of students and 
faculty have not only been supplemented by more careful case 
studies and observational records, but tests of attitudes, of 
interests, of habits and practices, of information and skills are 
becoming available for a more comprehensive evaluation. 
Additional tests are still needed, and many are in the process 
of construction. 

I have attempted to suggest briefly the place of tests in 
three areas of research, namely, in delineating goals for 
student personnel work, in checking the fundamental bases 


140 











5 > a) nee ae 
~ ye a en ref et ere 


——- 














i ee ee et 








TESTS IN STUDENT .PERSONNEL WORK 


upon which personnel programs and procedures are developed, 
and in constructing essential instruments for personnel work. 
Tests have already made an important contribution to these 
three areas of research, but the future contributions should be 
far greater than those of the past. The limitations of the 
contributions of the past seem to me to have been due to sev- 
eral factors which now can be largely overcome. 


In the first place, student personnel work originated 
largely from specific maladjustments within the traditional 
college program. Particular problems relating to the conduct 
and morals of students, their social life, or their housing led 
to the provision of special staff members to iron out these 
difficulties. Only within recent years has there been wide recog- 
nition of the broad implications of student personnel work and 
of the need for some coherent philosophy and program. 
Naturally, tests used in the student personnel field frequently 
were taken over, as they were developed, for other purposes 
and used without consideration of the behavior patterns which 
these tests implied. It seems to me that we are now ready to 
formulate a coherent conception of personal and social adjust- 
ment and to examine possible tests in the light of our concept, 
discarding or modifying tests which do not appropriately fit 
this concept and developing new tests that are in harmony 
with it. 

With this bit-by-bit accumulation of personnel responsibili- 
ties in the college program, it was natural that the tests used 
should largely be built upon a type of atomistic concept of 
human behavior, and that the test results should be sum- 
marized as single scores or as separate parts added together 
to form a total score. In recent years we have seen more 
clearly how to construct tests involving greater organization 
of behavior and how to summarize results in terms of descrip- 
tive scores relating various parts of a test, thus getting a 
more coherent picture of the student’s response. This elimi- 
nation of the single composite score is an important step in 
increasing the contribution tests make to the field of student 
personnel work. 


141 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


An additional reason for my belief that tests will make an 
increasing contribution to the field of student personnel work 
is the wide recognition of a broad definition of tests. No 
longer are tests conceived only as paper-and-pencil examina- 
tions. Tests are increasingly considered as controlled methods 
for obtaining a sample of a student’s reactions under certain 
specified conditions. With this recognition that a test is a 
means of sampling certain aspects of human behavior, atten- 
tion is now being focused upon clearer definitions of those 
aspects of human behavior which need to be sampled by means 
of tests. In educational testing twenty years ago, primary 
attention was given to sampling the content of textbooks which 
students were expected to remember and to sampling certain of 
the subject-matter skills, such as writing or numerical com- 
putation. It is now recognized that other aspects of behavior 
are important, such as the way in which the student attacks 
problems, the types of interests he is developing, the attitudes 
he has, his response to aesthetic experiences such as literature, 
music, and the arts. 

With greater clarification of the nature of testing has come 
a better specification of the behavior to be tested. Twenty 
years ago a test in chemistry would be built by specifying the 
topics, that is, the content to be sampled. No conscious effort 
was made to specify the type of reaction the student might 
be expected to make to this content. Now we recognize that 
we must specify not only the content but also the kind of reac- 
tion expected of the student, the sort of situation in which such 
reaction can be expected and, if possible, the kind of purpose 
which a student would have when reacting. By specifying these 
four aspects of behavior we have a much clearer idea of what 
we are trying to test, and this increases the probability that 
we shall control the testing situation sufficiently to provide 
a satisfactory test. 


142 




















a aa ant 


oN gE 


ern 


I {SPS a eee 











GRADE AND AGE NORMS FOR THE MINNESOTA 
VOCATIONAL TEST FOR CLERICAL WORKERS* 


GWENDOLEN G. SCHNEIDLER 


University of Minnesota 


ROGRESS in the applications of psychology, especially in 

the field of aptitude measurement, will be made by work- 

ing intensively on the measuring instruments which we already 

have, rather than by adding to the large number of devices 

about which we have insufficient research to justify scientific 

application. In line with this belief we have investigated cer- 

tain problems connected with the Minnesota Vocational Test 

for Clerical Workers. The portion of this research to be 
reported here deals with a normative study of this test. 

The usefulness of the Minnesota Vocational Test for 
Clerical Workers has been seriously curtailed because of the 
fact that norms have been established only for adults in the 
general population and for employed adult clerical workers 
(6). This limitation is the reverse of the more usual and 
serious one where norms exist for school populations while 
no adequate norms exist for adults and for criterion groups. 
The problem of appropriate norms for tests is one of the most 
urgent ones which counselors face in applying measuring 
instruments in guidance programs where individual analysis 
and diagnosis is an indispensable first step. The Minnesota 





1The cooperation of many persons has been necessary for the completion 
of this study and the author wishes hereby to express her appreciation. Professor 
Donald G. Paterson directed the construction of this measuring instrument by 
Dr. Dorothy M. Andrew and has followed through with advice and helpful 
suggestions in subsequent research. Assistance in the preparation of some of the 
materials for this study was furnished by the personnel of Work Projects 
Administration, Official Project Number 665-71-3-69, Sub-Project Number 229. 


143 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Vocational Test for Clerical Workers was standardized on 
adults, and excellent norms were developed and reported in 
the Bulletins of the Employment Stabilization Research Insti- 
tute (5, 9) and in the test manual (6). The test with its 
norms for adults was used to advantage by the Adjustment 
Service in New York City, a community guidance agency for 
adults, and by other agencies concerned with the counseling 
of adults. As the test has become more widely adopted, how- 
ever, it has been applied in many situations, especially to youth 
populations for. which the significance of test scores was not 
known. Some workers have devised local norms which have 
reflected selective factors of sampling. Limitations in interpre- 
tation have necessarily accompanied limitations in the selection 
of the sample. What has been needed is a normative study 
of this test based upon a large sample of youth representative 
of the populations at the junior and senior high school levels. 
With such research it becomes possible to apply the Minnesota 
Vocational Test for Clerical Workers to the age range for 
which the test is most appropriate from the standpoint of 
educational and vocational guidance. 


The Minnesota Vocational Test for Clerical Workers is 
composed of two subtests: Test I consists of 200 paired num- 
bers varying in length from three to 12 digits; Test II con- 
sists of 200 paired names varying in length from seven to 
16 letters. Slight changes had been made in half of the paired 
items and the subject is asked to compare the paired items 
as rapidly as possible, checking those pairs which are identical. 
He is allowed eight minutes for Test I and seven minutes for 
Test II. Scores are calculated on each of the two subtests 
using the “right minus wrong” formula. The administration 
of the test is described in the manual and other sources 


(6, 4,9). 


The reader interested in research evidence of the test's 
reliability and validity, and in information regarding adult 
norms and the relationship between test scores and other vari- 
ables is referred to the references on page 156 and especially 


144 








tee 

















tS 
fF 














NORMS FOR MINNESOTA CLERICAL TEST 


to the monograph by Andrew and Paterson (5). The follow- 
ing paragraphs summarize the research very briefly. 


Andrew (5) has presented evidence on reliability which 
indicates that the test yields sufficiently stable results for use 
with individuals. 


Andrew (5) has also presented a considerable body of 
evidence which points to the test as a valuable techinque in a 
clinical program of educational and vocational guidance or 
selection to eliminate persons not likely to succeed in clerical 
training or employment. The test results correlate highly with 
high school and college teachers’ ratings of clerical aptitude— 
in fact, higher than does a test of general intelligence. They 
also are definitely related to achievement records in typing 
and to criteria of production on clerical jobs as well as to 
supervisors’ ratings of proficiency on the job. The test appears 
to be measuring factors other than academic intelligence or 
clerical training and experience and it is better than other 
tests for differentiating clerical workers from persons in the 
general population. 


The relationship between the two subtests is not sufh- 
ciently high to justify using one test alone or combining the 
two scores (5). Reading speed is not an important factor 
in the test. 


The method of scoring is that of “right minus wrong.” 
Despite certain criticisms of this technique (7), it can be 
upheld on logical bases (10). 

Significant sex differences on test scores have been reported 
(5) for men and women in the general population but not 
for men and women employed in the same type of clerical 
positions. ak 


The test author (1, 2,3) has made an analysis of the test 
to determine the abilities which it is measuring and has con- 
cluded that Test I involves a numerical factor and Test II a 
verbal factor and that both are relatively unrelated to 
academic intelligence, ability to perceive spatial relationships, 
and dexterity with fingers and small tools. 


145 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


To secure norms which would be fairly representative of a 
cross-section of junior and senior high school pupils in the 
North Central Association of Secondary Schools, St. Paul, a 
midwestern city of one-quarter of a million population, was 
chosen. Approximately 4,000 pupils in grades eight through 
twelve were given the Minnesota Vocational Test for Clerical 
Workers. This does not represent the entire population for 
these grades. In order to guard against a possible selective 
sampling in the choice of schools, an attempt was made to 
select at each grade level schools representing the upper, 
middle, and lower socio-economic groups. One high school and 
one junior high school, judged to represent each of these three 
groups, were chosen. To guard further against securing a 
selective sampling within the schools, the pupils were tested 
in English classes, since English is a subject required of all 
irrespective of curriculum followed. Table 1 shows the num- 
ber of pupils included in the norms, distributed by grade, sex, 
and school. Schools A and B represent the above-average 
socio-economic groups; schools C and D represent the average, 
and schools E and F were characterized by a large proportion 
of families in the lower socio-economic groups. 

The testing procedure was standardized and adhered to 
throughout the program with testing done in the regular Eng- 
lish classes. The administration of the test was that pre- 
scribed by the test author (4, 5, 6). Personal data items 
including identifying data, date of birth, grade, school, curric- 
ulum, and father’s occupation were filled out by the pupils on 
the last page of the test folder (10) before the test itself 
was administered. Birth dates were checked against school 
records. Additional data, such as high school scholarship per- 
centile rank and intelligence test scores,” were collected for 
certain pupils and recorded on the personal data sheet. Ali 
tests were rescored at least once. 

An important prerequisite to the publication of norms on 
tests which are to be used widely is a careful description of the 
population on which the norms were based. Only in this way 





2The Aptitude Index of the Van Wagenen Unit Scales of Aptitude, Forms 
E, D, or C. 


146 

















*S[OOQIS JUIIDYIP 2g) a38UTsap 6199327, 























+06‘¢ saseg JO BIO], puri 9661  ]eI0Z 806‘T 1230.1 
‘ LLO'T L461 Zh 86E ~— 8S 16 Stz90Z—Sts«CGES 00r zbz Z6I 11X 
= ay. 2° 9 Oe eee ei ee 
zs es 
hs 808 Of 912 ZOE Le eel IST 291 186 96 Sol Ovt IX 
Ps 9 a v > 2 > « ¥ 
~ 
Oo 
< 88L siz $6 81 =O Ist 001 S91 -ZEE #zt $6 SST x Pa 
} 
A > 8° 29 a ov °° =F ~ 
Z 
Z 
$ 6s9 sez Z = 0Z_s«sNZ_——isLE +11 0 9% ZU zee 4 a a) xI 
& os. 8 2 8 .- 2 £-% 9 aq 44 
oy 
nD 
z ZLs z0zSZk SZ SCs«éBZ 96 09 zl 82 901 s9 EIT IIIA 
S. 1290.1, aq 4 & ~~ [Rwy aq 4 @ ~— jewy aq 4 « 

aemay 2 We ajeweg 212W open 








‘IOOHOS GNV ‘XaS ‘AGV¥D AM SASVD JO NOLLNGIULSIG 
' aTavL 











matali 5 cline a ei ES Rc Sm 









EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


are test consumers able to determine whether or not the norms 
are appropriate and applicable to their local populations. 
Before presenting the tables of norms we shall, therefore, 
describe the sample of the school population to which we 
administered the Minnesota Vocational Test for Clerical 
Workers, presenting statistics on intellectual characteristics, 
age-grade locations, and socio-economic levels. 

Table 2 describes a large proportion® of our sample at 
each grade level in terms of the central tendency and vari- 
ability of intelligence test scores. This table also gives similar 
figures for the total St. Paul school population for these 


TABLE 2 


COMPARISON OF THE MEANS OF THE ST. PAUL SCHOOL POPULATION* AND A LARGE 
PERCENTAGE OF OUR SAMPLE ON THE BASIS OF THE UNIT SCALES OF 
APTITUDE “APTITUDE INDEX” 




















Diff. 
S. D. 
Grade Groups N Mean S.D. Diff. Diff. 
St. Paul School 
Population 2,327 105.3567 14.7982 
VIII 2.4190 3.4366 
Our Sample 544 107.7757 14.7785 
St. Paul Schvol 
Population 2,469 104.7833 13.7340 
1X 3.7096 5.9430 
Our Sample 564 108.4929 13.2920 
St. Paul School 
Population 2,299 104.9674 13.1065 
x 1.1290 2.0610 
Our Sample 716 106.0964 12.6935 
St. Paul School 
Population 1,850 106.8270 12.2660 
XI 8814 1.6866 
Our Sainple 758 106.7084 12.0660 
5 ——— 
Population 1,323 108.3447 11.2040 
Xil —.9977 1.4996 
Our Sample 1,045 107.3470 11.9260 





*These figures for the St. Paul school population were provided through the 
courtesy of Professor M. J. Van Wagenen of the University of Minnesota. 





3Ninety-three per cent of the total 3,904 cases are so described. No intelli- 
gence test score for the remainder could be located but there is no reason to 
suspect the operation of a selective factor here. 


148 























Oe 





NORMS FOR MINNESOTA CLERICAL TEST 


grades. There is no necessity that our sample should be strictly 
representative of the St. Paul population. That would have 
been required if we had desired to develop norms appropriate 
only to the St. Paul school population at a particular date. 
Our purpose has been to develop norms on a sample judged 
to be fairly typical of the school population in North Central 
secondary schools and then to describe that’ sample as 
adequately as possible. 


It can be seen that our sample differs from the St. Paul 
school population by from less than one to less than four 
points on the average, depending upon the grade. These 
small differences are more significant statistically for grades 
eight and nine than for grades ten, eleven, and twelve, as can 
be seen from the ratios of differences to the standard devia- 
tions of those differences. Using this test of representative- 
ness, then, we can say that our tenth, eleventh, and twelfth 
grade students are, on the average, more like the St. Paul 
school population from which they were drawn than our eighth 
and ninth grade students. The differences are small, however, 
and it is not necessary for our purpose that the sample be 
exactly equivalent to this particular population. Furthermore, 
the slight differences in average scores on an intelligence test 
would probably not significantly affect the distribution of scores 
on the clerical test which is not measuring intelligence to any 
great extent. 


As a further description of our sample, Table 3 shows 
the percentage of each age represented in each of the five 
grade groups.* The age is that at the nearest birthday. 


A still further description of our sample was obtained by 
determining from the pupil’s statements the occupation of the 
father and then distributing these occupations according to the 
categories of the Occupational Rating Scale’ of the University 





4The reader may be interested in noting the resemblances between this dis- 
tribution and that which Terman and Merrill used for the standardization of 
the revised Stanford-Binet test. L. M. Terman and M. A. Merrill, Measuring 
Intelligence (New York: Houghton-Mifflin, 1937), p. 17. 

“Florence L. Goodenough and John E. Anderson, Experimental Child Psy- 
chology (New York: Appleton-Century, 1931), pp. 501-12. 


149 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 3 


AGE-GRADE DISTRIBUTION IN PERCENTAGES FOR CASES INCLUDED 
IN THIS STUDY 








Age 





Grade N 12 13 14 15 16 17 18 19 20 21 

















VIII 572 6 35 43 10 4 1 
IX 659 3 39 42 12 4 
x 788 5 35 41 16 2 1 
XI 808 7 36 40 12 4 1 
XII 1,077 5 38 38 14 3 1 





of Minnesota Institute of Child Welfare. A total of 3,347 of 
our 3,904 cases were so classified from occupations as given 
by the pupils whose fathers were living, employed in an urban 
community, and not on relief. There were 557 cases not classi- 
fied, and these included some for which the information was 
inadequate. The elimination of these groups from the classifi- 
cation, therefore, tends to give a slightly distorted picture of 
our sample, weighting it for the upper socio-economic levels. 
Such elimination was necessary for comparative purposes with 
figures available for the United States population and a similar 
urban community, Minneapolis. 


Table 4 presents the results of this classification for each 
grade and for those of our total sample who were classified. 
Comparisons may be made first with the distribution for Min- 
neapolis. Our sample appears to be slightly skewed towards 
the higher occupational levels. Part of this is accounted for in 
the number who were unclassified. Despite that, 46 per cent of 
our sample have fathers in the upper three occupational 
groups, and 49 per cent of the Minneapolis male population 
are in these three groups. There are more striking discrep- 
ancies, however, when our sample is compared with that for 
the male population of the United States as a whole. It is 


150 








NOE NI oe a me 








NORMS FOR MINNESOTA CLERICAL TEST 


“LE ‘d ‘TE 24qQey ‘(8Z6T ‘ssarg BOSaUUIPY JO AyISIDAIUL :stjodesuUIPY) 257 


J0042$-a4q fo usspjiy) 40f sisay jaurgq-uvwmjyny 24,7 ‘YBnourpoosd “] “J WoIy “sNSUID OFGT 242 WIZ PIE[NIed sIINJtj, 


























0-001 0°00r O'001 LtE‘s O'0OT 06 O'OOL 869 O'o0T £99 O'OOT 695 O°O0OT = L8b payissel> [BIO 7, 
0°€z Sl S‘t Ost 6't 9€ 0's st te (A ss T€ te ot [eins pues usqin 
‘s1a10oqe] Aeq ‘IIA 
£"s1 el 08 02 £2 8689 £8 68S v6 8629 ct f+ % B&B PAltas ABYSS “IA 
o'9€ ae 4 elt I8E‘t H6E 998 te 60€ fib lz Ose 952 T'9E 9Lt —_—s- Ssautsnq pue 
edtz3j9 Jour 
“PaLtys-1mag “A 
“Lt Ze SZ 028 Lez 0&2 see Zt 9% «SLI V2 = 92t cee 8 =8It ssauisng 
[reiye1  ‘sapem 
PAIS “EID “ITI 
V9 £9 vst sts €or zZst ezi 98 Ort £6 Stl 868 902 O01 [eleZeuew pue 
[euorssajoid-twiag "jy 
“Le v's £9 T1z ¥'8 82 9S 6£ Vb £2 Ly LZ 28 Ot [Buolssajorg ‘J 
‘S'n «Sid~ % N % N % XN % N % N % N sseiD 
‘dog ‘dog ajdues IlX PpeiyH IX 2peIn X eprin XI epery IIIA 2peiy jevonedns9 
od had a Gd bd a [B07 








da1TaY NO LON GNV ‘ALINQWWOO NV@UN NV NI 
GaAOTdHWA ‘ONTAIT JUIM OHM SYFHLVA JO SNOLLVdNIDIO NMONX FHL 40 NOILNGIUISIG “AGNIS SIHL NI SASVD JO SNOILWdN990 IWLNAYVd 
+ ATAVL 





151 














EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


unlikely that norms developed in a single community would 
be typical of all communities. Norms developed in a single 
locality, when well described, are more useful than those 
derived from many diverse populations, a combination of 
which may not be typical of any one situation. 


Table 5 presents the condensed grade norms* for the 
decile points for boys and girls separately in grades eight 
through twelve on the number checking (Test I) and name 
checking (Test II) tests of the Minnesota Vocational Test for 
Clerical Workers. These norms were derived from ogive 
curves constructed from the distributions of cases including 
all ages within each grade. The number of cases and descrip- 
tion of subjects at each grade level have been reported earlier 
in this article. The test user who wishes to apply this test 
to subjects at these grade levels should consider whether his 
population is similar to the one used for calculation of these 
grade norms. 


Some persons will prefer age norms, and for this reason 
we are presenting in Table 6 age norms for these same sub- 
jects who were enrolled in grades eight through twelve. We 
recommend the use of the grade norms whenever possible, 
however, as they represent actual grade populations which 
have been described. The age norms do not include an entirely 
representative sampling at these ages since we included only 
those pupils enrolled in school in grades eight through twelve. 
Furthermore, unequal numbers were selected at the various 
grade levels. Actually, however, the similarities between age 
and grade norms are more striking than the differences. The 
grade eight norms are similar to the age norms for fourteen- 
year-old pupils, for example. Also notice the striking resem- 
blance between the norms for eleventh grade and seventeen- 
year-old pupils. 


In conclusion, it is suggested that the grade norms should 





®Complete percentile norms are available in reference 10 and from the test 
distributors, The Psychological Corporation, 522 Fifth Avenue, New York City. 
For all practical purposes, however, the less refined interpretations of the test 
scores will be all that are required. 


152 








PRE On AP 


























NORMS FOR MINNESOTA CLERICAL TEST 


TABLE 5 
CONDENSED GRADE NORMS FOR BOYS AND GIRLS IN GRADES EIGHT THROUGH TWELVE ON 
TESTS I AND Il OF THE MINNESOTA VOCATIONAL TEST FOR CLERICAL WORKERS 


























Score Score 
Deciles TestI Test II TestI Test Il 
Grade VIII 
Males (N = 284) Females (N = 288) 
Blea eiiai aes cae cbuind oun eto a eeee 140 135 165 160 
aes alates eens Cok pin ote ueon tae 108 99 123 120 
Bee otic oss se oon ioe contend 101 91 114 108 
BE eid botnet tele Ta edaklowh ohe 3 86 110 102 
Reiner ee eane sea ae 89 81 105 96 
Shae nisda akeaw eaves seas aa 85 77 100 92 
ce aa et ee iat, eins ence aes 80 72 95 87 
Bot dacsine Gee ien la bas seieesoeeee 76 67 90 82 
Eo Shed nis GU sek oon KOA Ge SOTs wate aoa Ze 63 85 78 
OP rr Pree Cee OO Cee rt 65 57 76 70 
MD Saapctee ee thabsia Velie sede ee sar cele 50 35 25 35 
Grade IX 
Males (N = 332) Females (N = 327) 
165 180 180 
118 131 128 
105 122 119 
94 116 111 
88 111 105 
83 107 101 
79 103 9 
74 98 91 
69 91 84 
60 83 75 
30 50 40 
Grade X 
(N = 372) Females (N = 416) 
170 185 180 
121 144 140 
109 133 130 
102 127 121 
96 120 113 
91 114 106 
86 109 100 
81 104 9 
73 97 89 
65 87 79 
35 55 45 
Grade XI 
Males (N = 381) Females (N = 427) 
Be ests tute demas Rxes ange eee tees 180 195 190 190 
DY acide ts's bs «SW ome idee biane 131 132 149 147 
_ ERE Saar Rag he NAS pe Re Ry a 121 118 140 137 
Ba Sid fede OGEN Satna ss ee aionw eee lll 110 133 129 
ey ashes nt oe hen see Gad ieanese ee 106 103 128 124 
A Se ere Pe Pere pee 10 97 122 118 
ee EE RPL Ang GREE Pg ae ee ce ARE 96 91 117 113 
See Ry aan ee tia ake obi 91 86 111 106 
Bede k- Ae ae eee doec a6 T Ware eres 85 79 10: 100 
“EEE CAS RRR en 76 71 97 90 
Dap awedeede oevbdathuesianes staan es 45 40 $5 55 
Grade XII 
Males (N= 539) Females (N = 538) 
180 195 195 
141 151 153 
127 142 145 
119 135 137 
111 129 131 
103 122 124 
118 117 
92 112 109 
85 106 102 
78 97 92 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 6 


CONDENSED AGE NORMS FOR BOYS AND GIRLS IN GRADES EIGHT THROUGH TWELVE ON 


TESTS I AND Il OF THE MINNESOTA VOCATIONAL TEST FOR CLERICAL WORKERS 


























Score Score 
Deciles TestI Test II TestI Test II 
Age 14 
Males (N = 246) Females (N = 297) 
Pen htttad ss wean bs cone owe ee 142 162 162 172 
BN ee eat aI A a hi I ane 114 115 131 129 
es eee MS 5 tiie Cpe me oie eens Oe 107 103 122 118 
We ii, MORRO ae ge eS 101 94 117 111 
FM Bao aaa telah ig sy ts cs Raa bso te 95 89 112 105 
REE SER Oe Sine eer ee 89 $3 107 100 
De see eed aes Wa nein ace 85 78 103 95 
Ae SE ee mere Pea Ree 80 73 97 90 
Be a a a Ree a ke 75 68 92 83 
Brus a eutek tees ste aa stam nae bane ee 69 62 84+ 75 
Mo samece Olek Sicasiea seen sok ous 42 32 27 37 
Age 15 
Males (N= 323) Females (N = 345) 
ck er hee eRe ORs ware eRe 162 167 182 192 
el Wei Botte ae ote s eae ERS 124 121 138 139 
De cc acne wwe Swe x WEE Oe See 110 111 130 129 
Pe ei riad Cs ek onacs oak Rat he wan ale 105 101 121 120 
pies otis Gee Sale ais ets San oh ae ae 99 94 115 112 
Bra scree eaves Kowa alae wane 94 89 110 106 
Bd 2c migulatb hina ce we ae adbe a uddees 90 83 105 100 
RSE Sei ae phe ae are © eee Ss 86 78 100 95 
Berka rte Gi tars ipsam avons Mintaro ens Wists 80 70 94 88 
Rint. 7 sks otk eae CERE Reem Me Suck 72 60 85 77 
A ee ey ae eee ae 52 37 52 47 
Age 16 
Males (N = 362) Females (N = 411) 
| eS ee ee ae eee 182 192 187 187 
SE ree rey res 127 135 145 146 
Pee Minh Mek ce 55S os MS hee S 117 115 137 136 
Prove Ne np kblieG b:ckiahe cece eeemend 109 105 131 128 
Bad oe chn Mende le hekoe abe 104 97 123 121 
Ditss oc uas wa wabe a bewe atic 100 91 118 113 
Bale tacwenasnes coke ses core 94 86 112 105 
Bie cin ateha wie san haR eer ee ee 88 $1 107 100 
BY sy vaucuesnusawa suas ene wens 83 75 100 93 
EAS FS ieee et Lee ee ne er te 77 68 90 84 
Re ssee Aiitws bois oeiee es eas 47 37 $7 47 














NORMS FOR MINNESOTA CLERICAL TEST 


TABLE 6 (Cont.) 




















Score Score 
Deciles TestI Test II TestI Test II 
Age 17 

Males (N = 433) Females (N = 454) 
WO cies ied 4 Seek RI See 177 177 192 182 
ey oh tied io aurea ete eral ee eereg 135 137 150 150 
Bh ay oS chiernGeoin ee RaW GTA 125 122 140 140 
2 REE AE ER ere RPO EE Se 117 112 133 132 
Riis Section ce eee ke eons ea 110 105 128 125 
Doc kaii ore Asa aiabwie ae aa mead em olee 104 100 122 118 
Bs as ORM eae wie RIE eS THe re 99 94 116 111 
RR See et Oe epee tree 93 88 110 104 
OE Ee oem 86 $1 105 97 
OF xo oy Haitian Maewaees ele keke 78 71 96 83 
Se stent eke Sk bw weeds alls 47 37 57 47 

Age 18 

Males (N= 268) Females (N = 259) 
Be seach nets cer ean h anlar we 182 177 177 172 
OR eS Pols cd orvd see CA Sie RNS IRR 135 130 152 150 
Be hac aan Oy Sea etme Gherele WR ES 124 122 142 139 
fal OPO TOO OL RIOT 114 113 134 131 
Beh. Saas ci gad iecwas 107 105 128 125 
BN eco aly aaitele ee on aie rade guna 102 99 122 118 
1 Heat reat nce A OPE ere Ne 98 90 118 111 
Bleed yp ae gta sw nile oie Sas Wa 94 85 113 105 
Ba aie ares ie a ET wie aka ie Wenn ee 838 79 107 97 
BY ee Mia se aiiese wah at sae eae 79 70 97 87 
BD acoaitioe iene enw nacae 52 42 32 62 





be useful in junior and senior high schools and commercial 
business colleges which are concerned with the distribution of 
their pupils to the commercial classes upon the basis of aptitude 
for clerical work rather than upon the basis of such factors as 
lack of aptitude for college training. This test with its norms 
should contribute toward a more scientific and wiser counseling 
of pupils based upon all of the pertinent and available infor- 
mation. Norms for adults employed in clerical occupations 
also might be employed in judging a pupil’s clerical aptitude. 


155 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


REFERENCES 


Andrew, Dorothy M. “An Analysis of the Minnesota Voca- 
tional Test for Clerical Workers. I,” Journal of Applied 
Psychology, XXI1 (1937), 18-47. 


Andrew, Dorothy M. “An Analysis of the Minnesota Voca- 
tional Test for Clerical Workers. II,” Journal of Applied 
Psychology, XX1I (1937), 139-72. 


Andrew, Dorothy M. “An Analysis of the Minnesota Voca- 
tional Test for Clerical Workers,” Ph.D. thesis, University of 
Minnesota Library, 1935. 


Andrew, Dorothy M. “The Construction and Standardiza- 
tion of a Test for File Clerks,” Master’s thesis, University of 
Minnesota Library, 1931. 


Andrew, Dorothy M. and Paterson, Donald G. “Measured 
Characteristics of Clerical Workers,” Bulletin of the Employ- 
ment Stabilization Research Institute, University of Minne- 


sota, III: 1 (1934), 60. 


Andrew, Dorothy M. and Paterson, Donald G. “Minnesota 
Vocational Test for Clerical Workers. Manual of Direc- 
tions.” New York: The Psychological Corporation, 522 Fifth 
Avenue, 1939. 


Candee, Beatrice and Blum, Milton. ‘A New Scoring Sys- 
tem for the Minnesota Clerical Test,” Psychological Bulletin. 
XXXIV (1937), 545. 


Dvorak, Beatrice. ‘Differential Occupational Ability Pat- 
terns,” Bulletin of the Employment Stabilization Research In- 
stitute, University of Minnesota, III :8 (1935), 46. 


Green, Helen J., Berman, I. R., Paterson, D. G., and Trabue, 
M. R. “A Manual of Selected Occupational Tests for Use 
in Public Employment Offices,” Bulletin of the Employment 
Stabilization Research Institute, University of Minnesota, II: 


3 (1933), 31. 


Schneidler, Gwendolen G. “Further Studies in Clerical Apti- 
tude,” Ph.D. thesis, University of Minnesota Library, 1940. 


. Stead, William H.. Shartle, Carroll L., and Associates. Occu- 


pational Counseling Techniques. New York: American Book 
Co., 1940. 273 pages. 


156 








aca Ooo 











~—eo 











EXAMINING EXAMINERS 


NORMAN J. POWELL 
New York City Civil Service Commission 


HE EXAMINATION of applicants and the establish- 

ment of lists of persons eligible for appointment to pro- 
fessional positions in any school system is a most important 
task. In New York City, the examining work is performed 
by a board of seven examiners selected as the result of com- 
petitive examination given by the Municipal Civil Service 
Commission. In view of the considerable current interest in 
the matter of examinations for teacher and administrative 
personnel, a somewhat detailed description of the procedures 
used by the New York City Civil Service Commission in the 
most recent test given for examiner may be of suggestive 
value. 

It should be noted that civil service examinations, by their 
nature, are subject to peculiar and serious difficulties. Since 
examinations cannot be repeated it is not ordinarily practicable 
to obtain evidence as to the validity of specific test material; 
the selection and use of such material must rest largely upon 
judgment and indirect evidence. The passing mark is often 
arbitrarily set by law — usually at the 70 or 75 per cent point 
— necessitating nice judgment on the part of examiners as to 
the difficulty of the test material in relation to the calibre of 
the applicants; if the examiners do not judge the situation 
accurately, it is then necessary to resort to the transformation 
of scores. 

Since the examinations are given as a public service and 
the system depends upon public approval, the examinations 
must give the appearance of being just and reasonable, even 


157 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to the person who knows nothing about examinations. Finally, 
elements of the examination procedure are subject to appeal 
and review by the courts; it must, therefore, be defensible 
before judges who know nothing about examination techniques. 
No model procedure has yet been developed to meet all needs 
and situations. The following account presents one careful 
and painstaking approach to the specific problem at hand. 

Adopted in 1937, there is a statutory requirement in the 
New York Education Law to the effect that applicants for 
examiner positions must be college or university graduates 
and possess at least five years of public school teaching experi- 
ence. To this minimum qualification, the Civil Service Com- 
mission added the further requirement of three years of 
administrative experience in the field of education. Applicants 
were also required to be not more than 49 years of age at 
the time of filing application. In consequence of a state law, 
only residents of New York State were permitted to compete 
in the examination. 

A total of 114 applications was received. Of these, 88 
were adjudged as meeting the education, experience, age, and 
residence requirements. 


The Written Test 


Consisting of Dean Ned H. Dearborn of New York Uni- 
versity, President Paul Klapper of Queens College and Direc- 
tor Paul M. Mort of the Advanced School of Education, 
Teachers College, Columbia University, a special committee 
was designated by the New York City Civil Service Commis- 
sion to prepare the written test.’ 

The written test, weighted 4, together with an oral test, 
weighted 2, and an evaluation of candidates’ training and 
experience, weighted 4, comprised the entire examination. In 
order to be allowed to take the oral test, candidates had to 
pass the written and, in addition, had to pass the oral test to 
be eligible to have their training and experience evaluated. 
Only 61 of the 88 qualified persons appeared for the written 





1The writer served as aide to each of the committees who worked with the 
examiner test. 


158 


























— 
ORE 


AE TT OT SFEELED 











EXAMINING EXAMINERS 


test. Three applicants withdrew after part of the examina- 
tion, leaving a total of 58 candidates. 

Divided into four equally weighted parts and with a mark 
of 65 per cent in each part as well as a general written aver- 
age of 75 per cent required to pass, the written test was in 
neither traditional objective nor essay form. The abilities to 
be measured did not appear to lend themselves to usual objec- 
tive test treatment. The precise abilities taken for measure- 
ment may be exemplified by reference to both the questions 
used in the test and the directions given to candidates. 

In Part I, for which the candidate was allowed three 
hours, the applicant was informed: 

“In rating this paper consideration will be given to clarity in defin- 
ing the problem, cogency of facts used, orderly presentation of thought, 
conciseness of expression, and the general effectiveness of the analysis and 
discussion.” 

A single three-hour essay was to be written on one of five 
problems of which the following is illustrative: 


“Tt has been suggested that an examining board concerned with 
improvement of its techniques should maintain a research division. 


“Analyze this proposal discussing the functions, the organization, 
the personnel, the values, and the limitations of such a division.” 


Another example is: 
“It is maintained that in examinations for promotion there must 


be full recognition of the contributions which the candidate made in the 
subordinate position. 

“Discuss this policy of recognition from the point of view of an 
examiner in a school system.” 

The second part, requiring four hours, consisted of 25 
technical questions. The directions were similar to those given 
for the first part. An illustrative question is: 

“A reliability coefficient of .55 can be said to be typical for rating 
personality traits by ordinary judgment methods. Is this coefficient 
high or low? What basis do you have for your answer? What diffi- 
culties are involved in the interpretation ?” 

In another type of question requiring a longer response, 
the candidate was directed to assume the establishment of a 
new supervisory position in the Department of Education, 
was given the duties of the position, the requirements for 


159 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


which had been set up, and was asked to state whether he 
agreed with the requirements established, whether there should 
be additional requirements, and to give a critical analysis of 
the statements regarding the oral test to be given. 

The form of Parts III and IV was similar. In each the 
applicant was told: 

“This part of the examination is a test of your ability to analyze 
a given problem, to document your position, and to employ sound reason- 
ing. The examiners will be concerned with your ability to present 
evidence, not with the nature of your attitudes. You are required to 
demonstrate the depth and breadth of your scholarship in answering 
these questions.” 

Four hours were allowed for the completion of each part. 
There were 40 items in each. Typical questions in Part 
IIT are: 

“5. ‘Education is a phase of civilization, not the whole.’ What 

are the implications of this statement for education as one 
of the societal agencies? 

“9. ‘No philosophy of education is fundamental until it is based 
on sociology—not on physiology, not even on psychology, but 
on sociology.’ Is this a valid statement? Why? 

“15. Is it possible for education to be non-partisan? Why ?” 

Examples of the questions in the fourth part are: 

“5. “The very fact of contact between two cultures tends to en- 
gender features new to both.’ What basis is there for this 
statement ? 

“17. ‘Inertia conditions the solid framework of society and makes 
culture possible.’ Is this a valid statement? Why? 

“32. ‘As a group, the aged are increasing faster than the general 
population.” Enumerate three highly significant social 
effects.” 

An effort was made to eliminate the deficiencies of cus- 
tomary essay testing and to introduce the major advantages 
of the objective test, while retaining the virtues of essay test- 
ing. It may be pointed out that the considerable length of 
the examination made possible both intensive and extensive 
sampling. The type of item used in Parts III and IV has 
been subjected to quantitative analysis by the Social Science 
Research Council with the finding that it is an ‘extraordinarily 
useful” instrument. Dr. Brigham is of the opinion that 


160 





| EEL, wea MEPL S Te 











| 


PT 















EXAMINING EXAMINERS 


the kind of examination question utilized in the third and 
fourth parts of the test measures “breadth of background,” 
though the directions in the test under consideration here 
state that depth of scholarship is also to be demonstrated.’ 
Certainly Part I approaches closely the appraisal of depth 
aspects and Part II probes depth to a somewhat lesser degree 
than it evaluates breadth. 


The central difficulty involved in essay examinations is 
unreliability of rating. To promote objectivity in rating the 
last two parts of the test, the extent of the response by candi- 
dates was constricted temporally and therefore spatially. A 
seven-point rating scale was employed in which the charac- 
teristics of best, mediocre, and poor answers were recorded 
to serve as guides for the awarding of credits. Definite key 
answers were formulated to all the questions in Parts I and 
II of the test and a scale for the allocation of credits was set 
up. In all parts of the test, rating keys were constructed by 
two or three examiners in conference, partially by reference 
to relevant literature or other sources, partially by examin- 
ing candidates’ answers to provide a realistic scoring basis. 
Marking was performed independently by two or three exam- 
iners who, after the completion of the scoring, compared their 
ratings. All discrepancies except those of a trivial character 
were noted and candidates’ answers were reread to find an 
equitable base for agreement by the raters as to the mark 
merited by the specific answer. Final ratings were the mean 
of the individual examiners’ marks. 


For the type of examination employed in Parts IIT and 
IV, a rating reliability of .87 for total score has been found 
for a 50-question, four-hour social sciende test.* In a 40-ques- 
tion examination of similar form given for promotion to 
Captain, Department of Correction, New York City, the cor- 
relation for total test score between two raters was .93. Split- 
half reliability adjusted by the Spearman-Brown formula was 





2C. C. Brigham, Examining Fellowship Applicants (Princeton: Princeton 
University Press, 1935), pp. 22-3. 
3Tbid., p. 14. 


161 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


.92, while the standard error of measurement was 3.30.‘ 
There appears to be fairly substantial evidence that the kind 
of examination constructed is satisfactorily reliable. 

Much of the basis upon which the widespread belief in 
essay test unreliability rests seems to be a derivative of experi- 
mental findings arising from biased investigations. The bias is 
a result of rating without the use of keys so that differences 
between raters are differences between judgments as to the 
nature of correct responses and the magnitude of the credits 
to be awarded to partially correct answers as well as diver- 
gencies in the appraisal of particular responses. It appears 
exceedingly probable that there would be differences of opinion 
among experts in the rating even of many multiple-choice 
questions if the experts were not provided with a scoring key. 
In the present instance the formulation of key answers and 
rating scales, frequent conferences among raters, and the use 
of several raters tend greatly to eliminate unreliability of rat- 
ing in each part of the written test. 

The essential, significant characteristic of a test is its 
validity. Reliability is only of incidental importance since a 
test may be reliable without being valid but cannot be valid 
unless it is also reliable. Unfortunately, in the written test as 
in the oral and experience measures, it is not possible to com- 
pute a validity coefficient in terms of an acceptable criterion 
of ability on the job. No satisfactory criterion exists; only 
one candidate was appointed subsequent to the examination. 

A validity judgment may, however, be predicated upon two 
elements, the backgrounds of the special examining committee 
and the reasonableness of the appearance of the examination. 
Both factors support the belief that the written test is valid. 
The examining panel consisted of prominent educators highly 
experienced in the selection of personnel. Also, the written 
test ranged widely over many subjects apparently pertinent 
to examining work, and its length was sufficiently great to 
make validity an exceedingly probable attribute of the test. 





*Bureau of Research, New York City Civil Service Commission, “Selection 
of Captains in the New York City Department of Correction,” Public Personnel 
Quarterly, 1: 1, 6-7. 


162 





PE LS YI RT a SIR REO CRE 


Sr oo 














— 


DOT AERTS ESTER II ESAT cg ATI 


ATT 





EXAMINING EXAMINERS 


The final factor of great importance is the differentiating 
capacity of the test. In terms of maxima of 100 per cent, 
the applicants’ scores are set forth below: 


Part I PartII PartIII PartIlV 


Sa SRE ee EEE REALS: 62.1 53.7 47.1 50.0 
Standard Deviation ............. 15.1 11.6 12.8 10.0 
Highest Score ...............000- 94.6 74.4 68.6 77.1 
Lowest Score ...........ccccccces 35.0 26.3 21.4 25.7 
RE sin Sed eo aes Sie onal adie on alent 59.6 48.1 47.2 51.4 


Test scores separate well among candidates. Applicants 
are distributed over approximately 50 percentage points, about 
half the total possible range. It is the middle half of the scale 
which is occupied by candidates’ scores. The range is roughly 
from 25 to 75 per cent except for Part I where scores are 
distinctly higher. The highest mark for the written test com- 
bining all four parts was 76.7 per cent. The passing mark 
was 75 per cent. 


It follows that either the written test was too difficult or 
the candidates were too poorly equipped. Either conclusion 
suggests the desirability of transmuting original scores into 
higher marks. If the test was too difficult, some of the failures 
should be passing persons. If the applicants are defective, the 
condition is unfortunate but largely the product of the rigid 
statutory requirements limiting applicants to particular groups. 
An extensive publicity campaign had insured that all or prac- 
tically all qualified persons were aware of the opportunity to 
compete in the examination. 

The adjustment of marks involves the necessity for deter- 
mining the nature of the transmutation process. In each test 
part, the mean was taken as the point of reference and denomi- 
nated 75 per cent. Distances in standard deviation units above 
and below the mean fixed the precise percentages awarded to 
candidates. 

Thus, of the 58 candidates, 29 were passed in the written 
test. The purpose of the examination was to place on an 
eligible list those persons who appeared to be qualified for 
the position of examiner. With this guiding principle in mind, 
it was considered desirable that the better half of the candi- 


163 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


dates taking the written test be given the opportunity to sub- 
mit to further examination. It was considered that all those 
in the upper half of the group had demonstrated the posses- 
sion of a comparatively acceptable minimum of scholarship. 
Rescaling, then, involved taking one point of reference instead 
of another. It was believed that incompetents who managed 
to slip by in this process would be caught in the tests which 
were to follow. An oral test then was administered to the 29 
persons who had survived the written. 


The Oral Test 


The test was given in two parts equally weighted and 
separated in time by about six weeks. The first part was 
designed to measure technical competence, the second set out 
to appraise judgment, clearness and quickness of comprehen- 
sion, manner, appearance, and speech. 

In Part I of the technical-oral test, the 29 persons who 
had passed the written examination were divided into six 
groups, five with five persons and the sixth with four persons. 
For each group a demonstration oral examination was 
enacted. The demonstration orals were ostensibly given for a 
particular job. The particular positions for which the demon- 
strations were held were: teacher of English in the high 
schools, teacher of economics in the high schools, psychologist, 
research assistant, elementary school principal, and director 
of adult education. 

One group of candidates observed one demonstration, a 
second group observed another, and so on. In each case both 
demonstration examiner and subject were members of the 
examining division of the Commission. Into each demonstra- 
tion certain defects and virtues were introduced both with 
regard to the demonstration examiner and the demonstration 
subject. Demonstrations were written and planned in advance. 
Candidates were required to rate both participants in the inter- 
view which was enacted. Each demonstration lasted for about 
one-half hour. Candidates were permitted to take notes while 
the demonstration was in progress and then allowed an addi- 
tional fifteen minutes for note taking. 


164 





—- 


Py SR 








— 


TPT, AIRE. 6 FNC 











EXAMINING EXAMINERS 


Following the demonstration, candidates retired to an 
adjacent room and were summoned individually for an oral 
examination before the examining panel. This oral examina- 
tion lasted for not less than one-half hour. 

Candidates were rated by the panel in accordance with a 
set of directions which had been formulated in advance and 
in accordance with criteria prepared prior to the demonstra- 
tion and adjusted after the demonstration to fit the perform- 
ance which had been observed. The members of the panel 
viewed the demonstration at the same time as the candidates 
in order to be able to adjust accurately the criteria for rating 
to accord with the demonstration observed by the candidates. 

The ratings received by candidates were determined by the 
ratings they had given to participants in the demonstration 
and by the adequacy of the support they were able to adduce 
for their ratings. The candidate’s evaluation of the demon- 
stration examiner was required to be supported by observa- 
tions on the examiner’s attitudes toward the subject, his skill 
in questioning, and the general conduct of the interview. The 
evaluation of the demonstration subject was required to be 
supported by observations on speech, manner, judgment, and 
appearance. 


Of the 29 who had taken Part I of the oral, 16 qualified 
to proceed to the second part. That an applicant was “quali- 
fied to proceed” did not necessarily mean that he passed Part 
I, since a general average of 70 per cent in the oral test as a 
whole was required in order to pass. For example, candidates 
who obtained 60 per cent in the first part of the oral pro- 
ceeded to the second part of the oral with the possibility of 
passing the entire oral test only if they obtained a score of 80 
per cent on the second part, which would give them the 
required average of 70 per cent. The marks received by the 
16 candidates who qualified in the first part of the oral were: 


Mark Frequency 
ME Svein x54 Nad ee eee REAL RAE 3 
a arr Carer, EW Cra PRE POT kek 2 
Ps ae eA Ee ea 1 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ee 
Sceeeeeseceeos oe Sesser eee Cees veeneas 
ee 2.6 © 6. 6-8 + £10.6 € 6S: 8.0 8 O82 48 6H 6S COO S'S 
eee ee a Se ee eo ne a ee ee ee ee ee ee oe oe ee 
ee 
See ese Se st SARS ETHE CCSD SES 008 e 4 04 O 


Only three persons obtained scores of 70 per cent or better 
in the first part of the oral. It is indicated, then, that the 
large majority of the candidates who appeared for the second 
part of the oral had already exhibited mediocrity with regard 
to technical competence. 

Thete is a sizable discrepancy in score between Part I and 
Part II in only two cases. In one case, a candidate who had 
obtained 75.8 in the first part received 60.8 in the second. In 
the other case, a candidate who had obtained 74.2 in Part I 
received 59.0 in Part II. The point of these data is that can- 
didates were consistently poor. The fact of the matter is that 
most of the candidates performed in mediocre fashion in the 
first part of the oral and merely confirmed their mediocrity in 
the second part of the oral, already having shown inferiority 
in the written test. 

The 16 individuals who took Part II of the technical-oral 
test were divided into eight groups of two. For each group 
of two persons, examined separately in a particular half day, 
two types of situations were set up. In the first type of situ- 
ation, the candidate was directed to assume that he had been 
appointed to the position of examiner and that he had been 
serving in this position for about five years. The candidate 
was told that he would be visited by a person with whom he 
was to talk for about half an hour and that he was to conduct 
this interview as naturally and effectively as he could. For this 
examination, one person, Professor Robert K. Speer of New 
York University, acted as the visitor and assumed a different 
role for each group of two candidates. The roles assumed 
were: representative of a parent association, assistant exam- 
iner, colleague on the board, reporter, representative of the 


166 






as 












ee 











EXAMINING EXAMINERS 


Civil Service Commission, visitor from Sioux City interested 
in teaching personnel, failed candidate, and representative 
of a teacher training institution. 

A second type of situation was established after the con- 
clusion of the candidate’s interview with the visitor. This 
consisted of three or more questions. Examples of the first 
kind of question are: 

Can education be reconstructed through research? 

How would the adoption of a particular philosophy of examining 
in New York City affect educational practice and thinking throughout 
the whole country? 

What do you consider to be the major virtues (or defects) of our 
educational system in the United States? 

For the second question, the candidate was given a quo- 
tation, asked to tell whether he agreed or disagreed with the 
quotation and what the implications were of his agreement 
or disagreement for the work of the Board of Examiners in 
New York City. Some typical quotations are: 

“The danger which comes from emphasizing the significance of 
contemporary changes is that hasty and unsound revisions will be made 
in the curriculum.” 

“In the solution to educational problems lies the solution to all 
social problems.” 

“The main function of education is to perpetuate democracy.” 

“Adult education should be limited to educable adults.” 

For the third question, the candidate was required to talk 
for several minutes on any topic which he deemed to have 
implications for the work of the Board of Examiners. This 
was followed, where appropriate, by having members of the 
panel question the candidate directly in order to obtain 
clarification or amplification of one or more points made by 
the candidate. It is noted that members of the panel were 
free to ask questions of the candidate at’any time in order to 
explore a statement by the candidate. 

The direct questioning by the panel was introduced by 
having the candidate talk for several minutes on his experi- 
ence and background mainly in order to give the candidate an 
opportunity to “warm up” prior to the questioning. The rat- 
ings given to candidates were made in accordance with writ- 


167 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ten directions adopted by the examining panel. In the situation 
in which the candidate assumed that he was already an exam- 
iner, the following rating criteria were employed: soundness 
of position taken, cogency of discussion, clarity of discussion, 
penetration of treatment, time taken for effective organization 
of responses to the visitor, manner and attitude adopted to- 
ward the visitor, quality of speech, and appearance. In 
the situation involving more direct questioning, the follow- 
ing criteria were used: importance of material selected, sound- 
ness of position taken, relevance of material selected, clarity 
of presentation, penetration of treatment, quality of speech, 
manner and attitude adopted toward the panel, time taken for 
effective organization of responses, and appearance. 

Information for the rating of the five factors was supplied 
both by the direct questioning and the assumed examiner 
situations. In the rating of the five factors, a rating scale was 
employed which ranged from 0 to 100 per cent. The stand- 
ards set were: 

“Unacceptable candidates should be given ratings below 60 per cent. 
Ratings between 60 per cent and 75 per cent should be given to candi- 
dates whose characteristics considered in this part of the test are only 
very slightly inferior in level to those of a high-grade examiner. Ratings 
above 75 per cent should be given to candidates who undoubtedly 
possess a high level of the characteristics defined in this part of the 
examination.” 

There were three examiners in the first part of the tech- 
nical-oral: Joseph G. Cohen, Director of the Division of 
Graduate Studies, Brooklyn College; Ned H. Dearborn, Dean 
of the Division of General Education, New York University; 
Margaret V. Kiely, Dean, Queens College. 

Because the second part was less susceptible to objective 
rating, the examining committee was increased to five in 
order to minimize subjectivity: Ned H. Dearborn of New 
York University; Willard S. Elsbree, whose special field 
is teacher personnel, of Teachers College, Columbia Univer- 
sity; Margaret V. Kiely of Queens College; Jesse H. Newlon 
of Teachers College, Columbia University; Ordway Tead, 
President, Board of Higher Education in New York City. 


168 





@ 


Tg a TT a IIT ST 








we 











EXAMINING EXAMINERS 


The traditional deficiencies of oral tests are well known 
and include in civil service examining the difficulty of achiev- 
ing both the fact and the appearance of satisfactory reliability 
and validity. Appearances are of considerable importance in 
public personnel administration. Not only must the examina- 
tion be an effective instrument but also it must avoid the 
impression of being arbitrary, unfair, or capricious even 
though it is none of these in fact. The difficulty is generally 
that of describing adequately the basis for ratings and of 
connecting clearly the rating scale with the candidate’s per- 
formance in the determination of marks. It must be proved 
that marks are accurate and unbiased. This necessity was rec- 
ognized and met by setting forth in writing the nature of the 
scoring scales, criteria, and standards used, and by keeping 
stenotype and phonographic records of all questions and 
answers. The effort to have the examining panels be rather 
large and representative of diverse educational viewpoints 
and to have them consist of leaders in the profession was 
also considered to contribute toward the objective of coupling 
seeming with actual validity. 


The technical requisites for reliability seem to be present. 
The average intercorrelation of the examiners’ ratings on the 
first part was .91, making an estimated reliability for their 
composite ratings of .97, by the Spearman-Brown formula. 
The average intercorrelation among the examiners on the 
second part was .785, making the estimated reliability for 
their composite ratings .95. The average difference among 
examiners in Part I of the oral is 2.2; in Part II the average 
difference is 5.4. Scores were in five-point units as 50, 55, 60, 
65, 70, so that the disparity in grades is about half of one 
point on the Part I rating scale and about one point on the 
scale in Part II. a 


Since quantitative appraisal of the validity of the oral 
tests is impossible, the problem must again be approached 
logically. Validity refers to the degree to which a test meas- 
ures what it sets out to measure. The oral attempted to 
evaluate ability to judge applicants for educational positions, 


169 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to analyze weaknesses and strengths in oral examining meth- 
ods, to deal with visitors, to display good judgment and com- 
prehension, and to exhibit a satisfactory appearance, manner, 
and speech. Situations were formulated with the explicit pur- 
pose of measuring these factors, all of which appear to be 
significant samples of the examining task, so that from the 
viewpoint of job analysis the oral appears to be acceptably 


valid. 
The Combined Scores 


When the scores for both parts of the oral were combined, 
it was found that only one candidate had achieved a passing 
rating. At this point in the examination, however, adjustment 
of marks to pass a greater number was deemed undesirable. 
There are several reasons for not transmuting marks in the 
oral test. In the oral, the identity of candidates is known. In 
the written, identity is concealed by having candidates enter 
their application numbers in place of their names.’ To trans- 
mute marks where identity of applicants is known is to make 
possible the charge of manipulation. Further, the written 
was followed by other tests able to weed out the unfit; the 
oral was to be followed by an experience test in which every 
person admitted to the examination was certain to receive a 
passing mark because all possessed the prescribed minimum 
education and experience qualifications. Moreover, very sub- 
stantial opportunity had been afforded to aspirants for the 
position of examiner to prove themselves. Applicants had 
been examined at four separate occasions in the written test 
and at two different times in the oral. Finally, the matter of 
standards is highly relevant in deciding whether or not to 
rescale marks. 


The position of examiner is of the utmost importance in 
a school system. The examiner is responsible for the selec- 
tion of educational personnel and therefore, in a large meas- 
ure, for the quality of the teaching done and the manner in 





5The practice of the New York City Commission is to affix rating numbers 
to all written test papers and to detach the application number from answer 
sheets. The applicant knows his application number, but not his rating number. 


170 



































EXAMINING EXAMINERS 


which the youth of the city is taught and molded. The posi- 
tion pays $11,000 a year and is held for life after a six-month 
probationary period which is not made effective since no ap- 
pointee to this position has ever been discharged after pro- 
bationary appointment. It also must be borne in mind that 
educational practice in New York City affects to a degree 
educational practice in the remainder of the country. It seems 
reasonable to believe that under these conditions a high stand- 
ard is desirable for this position. The position was taken 
by the Commission that passing only one of 58 persons taking 
an examination of this type is not evidence of an unjustifiably 
high set of standards when the number of jobs to be filled is 
very small. 


A great row arose after the eligible list of a single name 
was published. It would be interesting and instructive to take 
up the controversy in detail, but such a discussion belongs 
elsewhere. Some of the objections can be laid to a lack of 
understanding of fundamental measurement principles. 


The examination was reviewed three times by the courts 
and once by a committee on manifest errors established by the 
New York City Civil Service Commission. First of the many 
and varied interpretations of the data came with the appoint- 
ment of the committee on manifest errors to hear and judge 
candidates’ appeals. The usual procedure of the Commission 
is to refer all appeals to a board of three members of the 
Commission staff. In view of the importance of this examina- 
tion, however, and the necessity of eliminating any suspicion 
of bias, the special panel was constituted. Its personnel con- 
sisted of Arthur A. Ballantine, noted lawyer and Undersecre- 
tary of the United States Treasury under President Herbert 
Hoover; Charles J. Pieper, Professor of Science Education 
and head of the Department of Science Education at New 
York University; William F. Russell, Dean of Teachers Col- 
lege, Columbia University. In a report dated May 3, 1938, 
the Commission found “no manifest error in the examining 
methodology or in the constitution of the examining panel,” 
stated that “the pass mark was set neither too high nor too 


171 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


low in relation to the level of competency required,” and con- 
cluded that there was “‘no manifest error in the rating of any 
candidate.” 

Suit to invalidate the test was then brought by seven of the 
failed candidates. It was held by the New York Supreme 
Court “that the technical-oral test against which the principal 
assault was made was meticulously prepared and impartially 
administered, that every safeguard to insure fairness and 
equality of competition was provided, and that the standards 
used in rating the competitors were in legal contemplation 
objective and reviewable.” 

The failed candidates had greater success with the State 
Appellate Division. Five justices of the Appellate Division 
concurred in finding the oral examination invalid. The justices 
disagreed quite strongly in regard to the selection of the 
ground upon which to rest their conclusion. One stated that 
it was illegal to limit the eligible list to one name; a second 
was impressed with the “comparative incompetence” of the 
sole passing applicant; others interpreted the evidence to point 
to the intrusion of ideological considerations in the technical- 
oral test. 

The final word came with the decision of the Court of 
Appeals which disagreed with the Appellate Division as to 
why the oral test was illegal but agreed that the test should 
be given all over again. 

Following these vicissitudes, the New York City Civil 
Service Commission held a new oral test in 1940. This time, 
three candidates were passed. The applicant who had been the 
only one to qualify in the previous test was included among 
the three who were successful in the new one. 


172 























NEW CRITERIA FOR OLD 


T. R. SARBIN AND E. S. BORDIN' 
University of Minnesota 


F ALL the literature on the prediction of college grades 
were to be assembled in one place, the outstanding charac- 
teristic would be the almost universal agreement that correla- 
tion coefficients higher than .70 are practically impossible with 
existing methods. As a matter of fact, Segel has collected 
over a hundred such studies only to discover that the median 
predictive validities of high school scholarship, tests of gen- 
eral achievement or aptitude, and tests of specific aptitudes 
or achievements were .54, .44, and .37 respectively.’ 

In studying the factors which are responsible for these 
relatively low coefficients, our attention is immediately focused 
on the nature of the criterion—the honor-point ratio. Com- 
monly used by colleges and universities as an index of the stu- 
dent’s achievement, this summary figure represents attainment 
in many different kinds of courses taught by various kinds of 
teachers with different standards of measurement. 

Two characteristics of this criterion are of importance for 
predictive efficiency, namely, its unreliability and its heteroge- 
neity. The first characteristic, unreliability, has not really 
been measured effectively, but can be estimated by logical 
analysis. It is agreed that even with improved methods of 
measuring attainment in college coursés a semi-intuitive, hit- 





1We are indebted to Professor E. G. Williamson for stimulation and advice 
in the formulation of this paper. We gratefully acknowledge his permission to 
use part of the data contained in his study Prediction of Success in the Arts 
College to be published in bulletin form by the University of Minnesota. 

2David Segel, Prediction of Success in College (U. S. Office of Education, 
Bulletin 1934, No. 15), p. 70. See also: Daniel Harris, “Factors Affecting Col- 
lege Grades: A Review of the Literature, 1930-37,” Psychological Bulletin, 
XXXVII (1940), 125-66. 


173 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


or-miss judgmental factor still remains in the grading process.* 
That this would create a measure of unreliability in the indi- 
vidual course grade is undeniable. As long as each teacher 
has a set of standards, individually derived and reflecting a 
somewhat unique set of objectives, so long will grades retain 
their unreliability. When we compound the unreliabilities of 
the individual course grades—which we do in computing honor- 
point ratios—it is improbable that the reliability of the final 
criterion will approach the reliability of the predictors. 


If it were possible to establish perfect reliability of course 
grades in individual subjects and of the honor-point ratio, the 
second characteristic of the criterion, heterogeneity, would still 
remain to interfere with prediction. For students who are 
taking courses in natural sciences, mathematics, social sciences, 
and languages in varying combinations, the criterion repre- 
sents a complex of many factors each of which logically ought 
to be sampled by the components of the predictive battery. It 
is self-evident that the more complex and heterogeneous the 
factors in the criterion, the more difficult becomes the task of 
assembling a predictive test battery which will adequately sam- 
ple this aggregate without simultaneously introducing into the 
predictive index other extraneous factors. Our task would 
be solved if we could assemble a series of pure measures for 
each component in the criterion. Pure tests, however, have 
not yet been created. The early promise of the factor analysts 
that a pure test was possible has not yet been realized. 


A word of caution is in order for those who would hasten, 
after having discussed the unreliability and heterogeneity of 
the prevailing criterion, to do something about it. The unre- 
liability or reliability of a criterion is only one factor in pre- 
diction. Of equal importance are the reliability and validity 
of the predictive battery. As already indicated, techniques 


8This is not to imply that judgments are to be abandoned. By learning to 
avoid the pitfalls and fallacies in human judgments, teachers can improve the 
quality and consistency of their ratings. Several writers have treated at some 
length the common errors in making judgments. See H. E. Burtt, Principles 
of Employment Psychology (Boston: Houghton-Mifflin, 1926), Chapter II, and 
M. S. Viteles, Industrial Psychology (New York: W. W. Norton Company, 
1932), Chapters IX, X. 


174 





cee 


L 
: 
\ 
; 
¢ 
| 

















(Ae 


i eae 


\ 
; 
( 














NEW CRITERIA FOR OLD 


have not yet been developed for creating tests which will be 
pure measures of any single factor. Thurstone’s utilization 
of factor methods in his Primary Abilities tests has not yet 
passed beyond the experimental stage. In fact, first reports 
have been conflicting. While the reliabilities of the predictive 
tests such as the American Council and the Ohio Psychological 
Examination distribute around .90, these alone do not offer 
hope for a great deal of improvement in validity. Thus, 
research ingenuity must be applied to the predictor variables 
as well as to the criterion. 


One final limiting aspect of prediction must be taken into 
account by the research worker before pitching his aspirations 
too high. This is the indeterminancy principle that Heisen- 
berg has formulated for prediction in the physical sciences. 
Present-day thinkers recognize that spontaneous and uncon- 
trolled factors are always present. These cannot be foreseen; 
they will introduce a measure of error in any forecast. Among 
such factors to be found in the prediction of academic achieve- 
ment are momentary motivations such as health conditions, 
social distractions, sexual distractions, home conflicts, tem- 
porary moods, sets, fatigue, and so on. Because of these not 
readily controllable elements, it would be safe to guess that 
even with perfectly reliable criteria and with statistically 
infallible predictive tests, the upper limit of multiple correla- 
tion would still not exceed .95. But such a pessimistic outlook 
need not be discouraging to further research. Much room 
remains for improvement. The increase in predictive efficiency 
of a correlation of .70 to one of .95 represents a range of 
about 41 per cent improvement over non-test estimates. 


The crux of the problem of selection and admission of 
students hinges upon accurate prediction instruments. Pre- 
diction serves the purpose of assisting college authorities to 





4J. M. Stalnaker, “Primary Mental Abilities,’ School and Society, L (1939), 
868-72. See also R. G. Bernreuter, “Primary Ability Tests Applied to Engi- 
neering Freshmen,” Psychological Bulletin, XXXVI (1939), 548-49, and William 
M. Shanner and G. Frederic Kuder, “A Comparative Study of Freshman Week 
Tests Given to the University of Chicago,” Educational and Psychological 
Measurement, I (1941), 85-92. 


175 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


select students who have a reasonable chance of profiting from 
the college’s offerings. Since the efforts of the test-makers, 
educational psychologists, and other research workers reached 
a ceiling at forecasting efficiency of approximately 28 per cent 
better than non-test prediction, further research may take any 
of three courses: 
1. further improvement in the reliability and validity of 
the predictive battery; 
2. improvement in the reliability of the criterion 
measures; 
3. design of a new criterion which will be more predict- 
able and at the same time acceptable to school 
administrators. 


At the present time, the first approach appears to be the 
one least likely to bring success, yet it is the one most fre- 
quently selected. With the development of the method of factor 
analysis hopes were raised for a significant increase in the 
efficiency of tests. The belief prevailed that with the isolation 
of factors in a test battery the foundation might be laid for 
the construction of purer tests which in turn would lead to 
more accurate prediction. Thus far this promise has remained 
unfulfilled.* Until now the more significant contribution in 
test construction has come from the method of inbreeding of 
test items as utilized by Toops in the construction of the 
Ohio Psychological Examination.’ By means of this con- 
tinuous process of selection of the most valid and most stable 
items, the predictive validity of the Ohio test has at times sur- 
passed .60 The promise for further developments from this 
source is at present greater than from the method of factor 
analysis. The stimulus for further advances by the method 
of inbreeding probably will come from studying the contribu- 
tions of the alternatives in a multiple choice item.’ But even 
with this contribution the prospects for the near future are not 





5Stalnaker, op. cit. See also Bernreuter, op. cit. 


6H. A. Toops, “The Evolution of the Ohio State University Psychological 
Test,” Ohio College Association Bulletin No. 113, March 20, 1939, pp. 2267-311. 


7G. F. Kuder, The Construction of Valid Test Items (Unpublished Disser- 
tation, Ohio State University, June, 1937). 


176° 
































NEW CRITERIA FOR OLD 


very bright for a large increase in reliability and validity 
of tests. 

The second course, increasing the reliability of the criterion 
measures, sporadically has been the topic of intense discussion 
in educational circles. As far back as 1913 Starch appealed 
for more stable grading standards. The major factors which 
he cited as the cause for instability of marks still are applic- 
able today: 

“(1) Differences among standards of different schools; (2) differ- 
ences among standards of different teachers; (3) differences in the 
relative values placed by different teachers upon various elements in a 


paper; and (4) differences due to pure inability to distinguish between 
closely allied degrees of merit.’’® 


In the last decade a new type of emphasis in the grading 
process has arisen largely through the influence of Tyler® and 
the Progressive Education Association evaluation work. The 
efforts of this group have been directed mainly toward the 
clarification of teachers’ aims and objectives and the opera- 
tional definition of these aims and objectives in terms of 
observable behavior. These developments have been directed 
chiefly at the secondary school level in connection with the 
Eight-Year Study. 


The wider application of these principles at the college 
level may offer some hope for the improvement of prediction. 
It is assumed that this type of study will lead to a more con- 
scious and a more stable evaluative process which in turn 
should serve to make grades more reliable. Some believe that 
greater homogeneity in the objectives of grades also would 
result from these developments. That is to say, many objec- 
tives probably could be identical for different courses. If a 
core of common objectives could be isolated and evaluated 
in the same manner in a whole series of courses, then the 
dificulty of constructing a more efficient test battery would 
be reduced considerably. 





8Daniel Starch, “Reliability and Distribution of Grades,” Science, XXXVIII 
(1913), 630. 


Ralph W. Tyler, “Needed Research in the Field of Tests and Examina- 
tions,” Educational Research Bulletin, XV (1936), 151-58. 


177 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The third approach is most likely, we believe, to bring 
about significant increases in the predictive efficiency of test 
batteries and is one that probably would encounter the most 
opposition from administrators and faculty. If we assume that 
the present grade criterion of college success lacks adequate 
predictability and that this deficiency warrants the substitu- 
tion of a more predictable criterion, then is it not logical to 
seek such a criterion? The answer can be only in the afirma- 
tive. At least a portion of our efforts must be directed at the 
possibilities of developing a more predictable measure of 
academic achievement which at the same time will satisfy 
other needs of the educational program. 


But we also must consider the difficulties of such an under- 
taking and be prepared to surmount them. Over and above 
the educational and statistical problems are the sociological 
problems which arise from the nature of our educational 
society. This society has developed a rigid and inflexible 
attitude toward marks which is likely to resist any but very 
strong pressures. 


To dislodge the tradition of marks, two forces must be 
overcome: first, the faculty, who feel that they have a vested 
interest in assigning grades; and second, parents, who, “‘indif- 
ferent at times to most phases of education, seldom neglect 
the report card.”*° This rigid adherence to marks has another 
deleterious effect upon the educational process. Instead of 
directing their efforts toward mastery of content, many stu- 
dents prepare for grades. Originally designed to serve merely 
as a record that a student had taken a particular course and 
had acquired a certain degree of proficiency, grades too often 
have become the only goal for many students. Foerster has 
described the situation in pungent terms: 

“Once a credit was earned, it was as safe as anything in the world. 
It would be deposited and indelibly recorded in the registrar’s savings 
bank, while the substance of the course would be, if one wishes, happily 


forgotten. Each course culminated in a final examination; if one knew 
one’s stuff then, one need never know it again. In a subject like required 





10R, O. Billett, Provisions for Individual Differences, Marking and Promo- 
tion (U.S. Office of Education, Bulletin 1932, No. 17), p. 459. 


178 



























NEW CRITERIA FOR OLD 


English, a student deficient in ability might, with effort, get a passing 
grade, and then, without effort, pass into semi-illiteracy; yet the rec- 
ord would show, to the day of doom, that he could read and write 
passably.”’24 


All this means that institutions of higher learning may 
have to abandon or modify the traditional marking system and 
“produce a new convention better than the old.’*? It is thus 
seen that continued enslavement to traditional marking systems 
not only interferes with the construction of more effective 
selection instruments, but also vitiates some of the funda- 
mental objectives of higher education. Therefore, upon the 
shoulders of the educational administrator falls the respon- 
sibility of re-examining the purposes of a marking system—a 
system that he either implicitly or explicitly has set up as 
proper. In this re-examination he will be obliged to leave the 
way open for the substitution of another marking system 
which will provide optimal satisfaction of these purposes. 


Our problem has come into sharp focus: a new standard 
for gauging achievement in college must be sought. This 
standard must palpably be superior to teachers’ marks and 
must rest on certain logical and statistical pillars. Our pre- 
vious discussion of the limitations upon predictive accuracy 
for college selection purposes has already indicated some 
of the desirable features: first, the measure should have 
reliability; second, it should be as homogeneous as possible 
both with respect to scale and to the nature of the factors 
included; third, it must have relevancy for the educational 
objectives to be measured. 


Since the final test of the predictability of a criterion will 
be empirical, we turn to the data that we have to present. 
At this juncture the nature of the statistical evidence we have 
obtained forces us to particularize in terms of the liberal arts 
college, and more specifically, the junior division. The under- 
lying principles, however, can readily be adapted to other 
college units. Before proceeding with the analysis of the com- 





11Norman Foerster, The American State University (Chapel Hill, N. C.; 
University of North Carolina Press, 1937), p. 97. 


127bid., p. 146. 


179 


EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


parative predictability of the two types of criteria, a word 
must be said about the objective of the junior division of the 
liberal arts college. At the risk of seeming impertinently pre- 
sumptuous in stating this objective in a word, the authors sug- 
gest that the primary purpose of the first two pre-specializa- 
tion years in the arts college is to provide students with oppor- 
tunities for cultural growth. Generally speaking, today’s lib- 
eral arts colleges by and large direct their efforts—sometimes 
futilely—toward a cultural goal. Although in this context the 
word culture is to be looked upon with the gravest suspicion, 
the trend in today’s core curricula seems to be away from the 
hot-house variety of culture for the elite and in the direction 
of the by-products of the best in science and society for all. 
In most cases, the American liberal arts college is bent upon 
providing students with a broad understanding of culture in 
all of its ramifications. 


To be cultured, a man must be “more than an ape-like 
creature posing under the mask of hastily acquired drawing 
room manners.’* The student must acquire during his pre- 
specialization years the individual qualities and competences 
which go into rich and satisfying living, and which give mean- 
ing to his experiences as a member of society. This does not 
imply that a cultural pattern rigidly common to all is the goal 
of liberal education. To dragoon widely different students into 
a legion of regimented automatons, each responding in the 
same way to the same situations, is obviously to be deplored in 
democratic institutions. As Eckert has phrased it: ‘‘Not like- 
minded, but ‘free’ individuals become the goal of teaching.’’* 
Idiosyncratic behavior remains as an outstanding desideratum 
of liberal education. 

At this point one of two approaches is immediately appar- 
ent for evaluating these cultural objectives. We may retreat 
to the traditional methods of evaluation — teachers’ grades 
based on some esoteric combination of improvised testing, 





13C, J. Warden, The Emergence of Human Culture (New York: Macmillan 
Company, 1936), p. 8. 

14Ruth E. Eckert, “Who Are the Cultured in Our Colleges?” Educational 
Record, January, 1930, pp. 133-35. 


180 





























NEW CRITERIA FOR OLD 


dazzling intuitions, and the persistence of the student in 
attending classes; or we may turn to procedures such as those 
embodied in certain uniform testing programs. 


Such a measure of culture could be assembled with the 
Cooperative General Culture test as a nucleus.** Since 1932, 
the content of the Sophomore Culture test has been consider- 
ably expanded, and today the student runs a gamut of tests 
from mathematics to aesthetic appreciation before entering his 
junior year. Admittedly, the paper-and-pencil instrument does 
not sample the whole range of culture, neither does it directly 
tap the important areas of motivation, attitudes, and values. 
It is the only method yet devised, however, which has the fun- 
damental characteristics without which scientific measurement 
in education becomes a farce, a tragedy, or both. If college 
administrators and faculties are to decide whether the Soph- 
omore Culture test will satisfy their needs, they must weigh 
it upon scales which carry empirical as well as logical weights. 
If it is agreed that the predictability of a proposed criterion 
is one characteristic pertinent to its adoption, the predictability 
of the Sophomore Culture test becomes a matter of moment. 


One might point out that it has a high reliability—coefh- 
cients in the .90’s are reported in the literature—or that it is 
constructed so as to give comparable scores, but the final proof 
of its superior predictability must rest upon obtained correla- 
tions with predictive tests. The remainder of this paper is 
devoted to an exploratory investigation of the predictability 
of the two criteria we have been discussing—teachers’ marks*® 
and the Sophomore Culture battery. 


The usual technique was employed in assembling a battery 





15The Cooperative General Culture test may be procured from the Coop- 
erative Test Service, 15 Amsterdam Avenue, New--York. The other Cooperative 
tests used in this study may be obtained from the same source. 


16Teachers’ marks were transmuted to two year honor-point ratios as 
follows: for each credit hour in which an A was recorded, three honor points 
were assigned, for each credit hour of B, two honor points, for each credit hour 
of C, one honor point, for each credit hour of D, no honor points, and for each 
credit hour of F (failing) one honor point was subtracted. The honor-point ratio 
was computed by dividing the total number of honor points earned by the 
total credit hours earned. The Sophomore Culture test was made up of the 
following tests in the Cooperative series for 1936: General Culture, English, 
General Science, Literary Acquaintance. 


181 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of tests for the selection or rejection of applicants. These 
tests were correlated first with teachers’ marks and then with 
scores on the Sophomore Culture tests. The battery of 
entrance tests which has the highest correlation with the 
criterion could thus be used to select future candidates for 
admission. At the time of the students’ entrance into the arts 
college, scores on the following measures were obtained: 


High school percentile rank 

Minnesota College Aptitude test (form AM) 

Minnesota College Aptitude test (form 1926) 

Cooperative English test (Part I, form 1934) 

Cooperative Vocabulary test (Part II of English test 
above) 

Cooperative Contemporary Affairs test (form 1934) 

The group included in this study was composed of students 
who entered the arts college of the University of Minnesota 
as freshmen in the fall of 1934, and who took the Sopho- 
more Culture test in the spring of 1936 in applying for admis- 
sion into the upper division. The group was composed of 138 
students, 56 men and 82 women. Only students were included 
for whom the 1934 entrance test scores were available and for 
whom high school percentiles were recorded. The group 
studied, though not closely representative of entering fresh- 
men, probably was representative of sophomores applying for 
entrance to the senior division of the arts college. Any limi- 
tation in representativeness, however, invalidates no compari- 
sons between different measures within this group. 


TABLE 1 


CORRELATIONS BETWEEN TWO YEAR HONOR-POINT RATIOS AND INDIVIDUAL MEASURES 
IN THE FRESHMAN TESTING BATTERY 


Total Men Women 











High school percentile rank....................4.. 52 37 55 
Contemporary Affairs test.............-..eeeeeeee -50 53 44 
Minnesota College Aptitude test (1926)........... 50 56 43 
Cooperative English test.................0.ceeeeee 41 .50 43 
Minnesota College Aptitude test (AM)............. 49 49 39 
Cooperative Vocabulary test................eeeeeee 35 40 30 


Table 1 reveals the usual order of correlations between 
teachers’ marks and predictive tests. The best single predictor 


182 














(cS 


Se ai a ac. eer i 





NEW CRITERIA FOR OLD 


of grades is the high school percentile rank, demonstrating that 
—to a certain extent—high school teachers and college teachers 
are influenced by the same factors in assigning grades. The 
highest correlation in the table is .57, between grades for men 
and high school percentile ranks. The lowest coefficient, .30, 
is between college grades for women and the Vocabulary test. 
The other coefficients fall between these two values. 


TABLE 2 


CORRELATIONS BETWEEN SOPHOMORE CULTURE TEST AND INDIVIDUAL MEASURES IN 
THE FRESHMAN TESTING BATTERY 











Total Men Women 





Contemporary Affairs test..............ceeeeeeeee 81 81 82 
Minnesota College Aptitude test (1926)............ 77 i 76 
Cooperative Vocabulary test................eeeeee 68 .63 Py 
Minnesota College Aptitude test (AM)............ 67 68 66 
Cooperative English test............... 0c ec eeeeees 58 62 66 
High school percentile rank..................0eees .29 43 21 


Contrast these correlation coefficients with those in Table 
2. With the single exception of the high school percentile 
rank, correlations between the Sophomore Culture test and 
the various measures range from .82 to .58. The Contem- 
porary Affairs test has high predictive value, as have the 
College Aptitude tests and the Vocabulary test. The nature of 
the distribution of the English test scores accounts for the 
lower coefficient for the total group than for either the men 
or the women. It is especially noteworthy that high school 
percentile ranks have little predictive value for such a criterion. 
In terms of forecasting efficiencies for the total group, the 
highest coefficient in Table 1 corresponds to 15 per cent, while 
the highest in Table 2 corresponds to 41 per cent.” 

The same trend appears when the multiple correlation 
coefficients of selected batteries are compared. Table 3 reveals 
the order of correlation between the two criteria and two sets 
of entrance tests. Battery A, composed of three measures 
(high school percentile rank, Minnesota College Aptitude test 





17Forecasting efficiency computed by formula E=100 (1— V1—r?) 
which gives a measure of the per cent of improvement over non-test prediction. 
See J. P. Guilford, Psychometric Methods (New York: McGraw-Hill, 1936), 
p. 363. 


183 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


—form 1926, and English test), correlates .64 with grades, 
but .77 with the Sophomore Culture test. The corresponding in- 
dexes of forecasting efficiencies are 23 per cent and 36 per cent. 
When the Contemporary Affairs test is added to the three 
other measures (Battery B), the correlation with honor-point 
ratio becomes .67, and with the Sophomore Culture test .86. 
The corresponding forecasting efficiencies are 26 per cent and 
50 per cent better than non-test prediction. 


TABLE 3 


MULTIPLE CORRELATION COEFFICIENTS BETWEEN BATTERIES OF SELECTED ENTRANCE 
TESTS AND TWO YEAR HONOR-POINT RATIO, AND SOPHOMORE CULTURE TEST 

















Two Year Sophomore 
Honor-Point Ratio Culture Test 
Total Men Women Total Men Women 
Entrance Battery A*...... 64 67 64 aa 78 ad 


Entrance Battery Bt...... 67 63 66 86 86 87 


*Entrance Battery A—high school percentile rank, Minnesota College Apti- 
tude test (1926), and Cooperative English test. 

Entrance Battery B—high school percentile rank, Minnesota College Apti- 
tude test (1926), Cooperative English test, and Contemporary Affairs test. 


The differences between these multiple correlations for the 
two kinds of criteria were tested for significance. The differ- 
ences for Battery A- were in the area of doubtful validity 
(P < .05 but > .02); those for Battery B were well beyond 
the boundary for significance (P < .01).** 

The significant results of this exploratory study in the pre- 
diction of college success may be summarized as follows: (a) 
The substitution of the Sophomore Culture test for the con- 
ventional grading system as a criterion of college achievement 
markedly increases the predictive validities of the standardized 
entrance tests and markedly decreases the predictive validity 
of high school grades. (b) The lowest validity coefficients 
were obtained when high school percentile ranks were corre- 
lated with the Sophomore Culture battery. The highest zero 
order coefficients were obtained in correlating the Sophomore 
Culture battery with the Contemporary Affairs test. Eckert 


18R, A. Fisher, Statistical Methods for Research Workers (7th ed.; London: 
Oliver and Boyd, 1939), p. 209. 


184 

















NEW CRITERIA FOR OLD 


reports similar findings. She concludes: ‘‘Students most con- 
versant with the achievements and thoughts of the past, and 
most outstanding in the realm of book-learning, tend on the 
whole to be those most alert to the contemporary scene.’’”® 
(c) A combination of four entrance measures returned validity 
coefficients with the Sophomore Culture test corresponding to 
50 per cent forecasting efficiency. The Contemporary Affairs 
test alone correlated higher with the Culture test than did a 
combination of three measures. 

Interpreting these results, the Sophomore Culture test 
correlates high with the other objective tests because of its 
close similarity in objectivity of form, its greater relevancy 
and comprehensiveness, and in the overlapping of the con- 
tent and ability measures, and correlates low with high-school 
grades because the latter appraise other areas besides those 
involved in the test sampling of achievement. Grades in 
college, conversely, correlate higher with grades in high 
school, and lower with the standardized tests because they 
measure areas outside of tested achievement but similar to 
those measured by high-school grades. This interpretation can 
be further supported and extended since the correlation be- 
tween the Sophomore Culture test and the two year honor- 
point ratio was only moderately high: .58 for men and women 
combined, .64 for men, and .51 for women. Scholastic grades 
and the Culture test, even when they presumably sample the 
same areas of knowledge, certainly do not measure all of the 
same areas or abilities involved in academic achievement in 
college. 

These results are not without precedent. For example, 
Frasier and Heilman reported correlations between the Thorn- 
dike Intelligence Examination and grades assigned subjectively 
and objectively. The average coefficients’ were .45 and .60 
respectively.” For grades in French as assigned in the usual 
manner, Tharp found a correlation of .47 with the Iowa Place- 





19Eckert, op. cit., p. 135. 

20G. W. Frasier and J. D. Heilman, “Experiments in Teacher College 
Administration, III: Intelligence Tests,” Educational Administration and 
Supervision, XIV (1928), 268-78. 


185 





NEL AES 


AR PR TAS LOR REL ERR RAE Se PF 


zoe mg 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ment test for foreign language aptitude. When an achieve- 
ment test was used, the correlation jumped to .64.” 

From our examination of the problem of prediction, we 
draw the conclusion that a fruitful point of attack is through 
the substitution of a more reliable and therefore more predict- 
able measure of achievement. This paper has presented data 
which definitely demonstrates that a pencil-and-paper evalua- 
tion instrument such as the Sophomore Culture test is more 
predictable than the time-honored grade criterion. But it 
would be foolhardy indeed for the authors to take the next 
step, that of advocating that this attribute alone justified its 
substitution for honor-point ratio. This decision lies within 
the province of the educational administrator. He must decide 
whether more accurate prediction — a sine qua non of all efh- 
cient admissions policies — plus the Culture test’s degree of 
relevance is sufficient to outweigh those desirable qualities 
which may still be claimed for the traditional marking system. 
In short, he must decide whether this new criterion is more 
acceptable than the old. 

A final word for research. The Sophomore Culture test, 
in common with other achievement tests, largely measures 
recall of information.” That information is only one phase 
of education must be recognized. Other components of cul- 
tural growth — attitudes, values, motivations, goals, and affec- 
tive experience — must be measured by other instruments. It 
is hoped that in the not-too-distant future these important out- 
comes of education can be appraised with sufficient accuracy 
so that we may know how well the American college functions 
as the vehicle of culture. 





21J. B. Tharp, “Sectioning Classes in Romance Lauguages,’ Modern 
Language Journal, XII (1927), 95-114. 

22B. E. Cureton, “Evaluation or Guidance—A Report of the 1939 Sopho- 
more Testing Program,” Journal of Experimental Education, VIII (1940), 
308-40. 


186 














a 
A FACTOR ANALYSIS OF A NON-VERBAL 
REASONING TEST 


ROBERT I. BLAKEY 
Social Security Board 


OME time ago Dr. Andrew W. Brown and the author 

constructed a ‘“‘Non-Verbal Reasoning Test” for use at 

the high school level. A preliminary report of its construction 

is being published by The Journal of Educational Psychology. 

The present article concerns itself with the results of a factor 

analysis of the intercorrelations between the subtests rather 
than with the actual standardization of the test. 

The test was constructed with the idea that it should 
measure in a non-verbal manner the higher intellective proc- 
esses of comprehension, mental alertness, deductive reasoning, 
inductive reasoning, and spatial relations or analysis. The pri- 
mary purpose of this study is to isolate and identify any 
common factors present and to compare them with the ex- 
pected factors. 

Other problems which may be considered in the light of 
the factor analysis are: (a) a comparison of the factorial 
composition of tests which are variations of Thurstone’s tests 
with the factorial composition, as determined by Thurstone, 
of the tests he used; (b) a reconsideration of the perennial 
problem of the existence of a general factor of mental ability; 
(c) the comparison of the factors found in this group of 
tests with factors found in analyses of other tests; (d) a fur- 
ther examination of various methods of ascertaining the num- 
ber of factors which should be taken out of a correlation 
matrix. 


187 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


All tests are time-limit tests and were introduced by fore- 
exercises which were explained by the examiner. They were 
presented in the order listed. 

1. Manikin—a page of pied figures of little men. The 
figures are simple line drawings with variations in the positions 
of arms and legs. The problem is to draw a ring around each 
manikin which is exactly like a model at the top of the page. 
It was thought that this test might be saturated with the 
Perceptual Speed factor. The Spearman-Brown corrected 
reliability is .81. 

2. Identical Patterns—12 rows of patterns formed by 
overlapping geometrical forms. The first pattern of each 
row is separated from the others by a heavy vertical line. 
The patterns are in 12 variations each composed of two circles 
and two right triangles. The same size forms are used in 
each variation, the differences being due to relative positions 
of the components and whether the forms are solid or dotted 
lines. Each row contains one or more patterns exactly like 
the first one in the row, and the problem is to place a mark, 
under each pattern which is exactly like the first one in its® 
respective row. It was thought that this test would be a varia- 
tion of Thurstone’s /dentical Forms test and consequently 
loaded with the Perceptual Speed factor. The Spearman- 
Brown corrected reliability is .98. 


3. Fitting Parts—each item consists of a solid black 
geometrical form, which has been cut into three parts, and 
four outlined figures, one of which is the same size and shape 
as the black figure which was cut. The problem is to indicate 
that one of the outline forms into which the solid black 
pieces could be made to fit exactly. Discrimination of both 
size and shape is involved for each item. It was thought that 
possibly the factor Visualization or Space was involved in the 
solution of this test. The Spearman-Brown corrected reliabil- 
ity of the 12-item test is .47. 

4. Opposite Sides—each item consists of three drawings 
identical in size and shape. The problem is to select the 
drawing in each item which is a mirror image of the other 


188 




















ANALYSIS OF NON-VERBAL REASONING TEST 


two drawings. Each drawing is a little pennant the shape 
of a non-isosceles right triangle and may be rotated in any 
position. It was thought that possibly Space and Induction 
might be used in the solution of this test. There is no really 
parallel form to this test although the idea was adopted from 
Thurstone’s Flags test. The Spearman-Brown corrected re- 
liability is .88. 


5. Code—a code consisting of eight boxes divided in 
half is placed at the top of the test. Each box has a unique 
group of squares and circles in the top half and an unusual 
group of triangles in the bottom half. Below the “‘code” are 
five rows of the little boxes, some exactly like the boxes in 
the code and some with incorrect pairing of the symbols. 
The problem is to place a line under each box which is different 
from the code. It was thought that the test might contain the 
Perceptual factor. The Spearman-Brown corrected reliability 
is .96. 


6. Circle Grouping—each item consists of four boxes 
containing little groups of circles. The grouping varies from 
box to box. One circle in each of the first three boxes is 
blackened according to a system. The problem is to discover 
that system and apply it in blackening a circle in the fourth 
box. It was thought that possibly Induction would be involved 
in solving this test. The Spearman-Brown corrected reliability 
for the 12-item test is .98. 


7. Form Series—this test is the usual series type with 
only three meaningless forms used in combination. One figure 
in each row is omitted and a blank inserted. The problem is 
to indicate which form belongs in the blank. It was thought 
that Deductive Reasoning or Inductive Reasoning would be 
involved in the solution of this test. The corrected Spearman- 
Brown reliability of the 22-item test is .86. 


8. Circle Reasoning—a variation of the Marks test used 
by Thurstone as a measure of Inductive Reasoning. There 
are five rows of groups of circles and dashes. The grouping 
changes from row to row. One circle in each of the first 


189 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


four rows is blackened according to a rule. The problem is 
to find the rule and apply it in blackening a circle in the fifth 
row. It was assumed that this test would contain Induction. 
The corrected reliability is .94. 

9. Form Relations—this test is a parallel form of Thur- 
stone’s Pattern Analogies test. The problem is to indicate 
one of five choices which bears the same relation to the third 
figure as the second bears to the first. Inductive Reasoning 
or Deductive Reasoning was assumed to be necessary for the 
solution of this test. The corrected reliability is .97. 

10. Form Reasoning—at the top of the test is a table 
showing how any two of seven forms could be combined to 
equal another one of the seven. Each item consists of three 
of the forms in a row. The task is to combine the first two 
forms according to the table and then combine the resulting 
form with the third to equal another form, the final result 
to be indicated by underlining one of five choices. It was 
thought that possibly Deductive Reasoning would be used 
to solve these problems. The Spearman-Brown corrected 
reliability for the 12-item test is .98. 


The Subjects 


The subjects were 286 high school pupils from a school 
in a suburb of Chicago. All tests were given by two experi- 
enced examiners in a well-lighted room. All tests were admin- 
istered in one 40-minute period. Eighty per cent of the whole 
group was between 15 and 18 years of age. The mean Otis 
1.Q. was 114. About 54 per cent of the group were boys. 
No sex difference was found for combined scores on the whole 
test. No grade difference was statistically significant. The 
correlation of total test score with chronological age was —.13 
for the age range of this group. 


The Factor Analysis 


The table of intercorrelations (Table 1) was computed 
with the aid of Computing Diagrams for the Tetrachoric 
Correlation Coefficient (2). Correlations obtained in this 


190 





























ANALYSIS OF NON-VERBAL REASONING TEST 


manner are considered by Thurstone (6, p.58) to be applica- 
ble to factor analysis. In effect the scores are normalized in 
the process of correlation. 

The factors (Table 2) were extracted by the Thurstone 
centroid methods. Here the problem of the number of factors 




















TABLE 1 
INTERCORRELATIONS OF TESTS 

Variable 1 2 3 4 5 6 7 8 9 10 
Mamikin- ....6660%%. MBM a es SS SB 2 we ae. AS 
Identical Patterns.... .24 OS: TY 22 OG); 6. BS SS et 
Fitting Parts........ .27-—-«.08 Ad ae OO ES: AO DO Be 
Opposite Sides....... i, en SMS f AS 3260: BSS. Se OP 
es tee cw sida ee eee’? aes Be 26. a2: 25. 35 38 
Circle Grouping..... AS 46 2 25 . 26 At 3 53 AS 
Form Series......... 3 9G 85.38 | 22 AB 35 (52 5% 
Circle Reasoning..... M9 35. 40: 32 BE SO 8S SS 38 
Form Relations...... Ze SS - Ze 29. SS: 58 Set SS 40 
Form Reasoning..... AD 20 22 3h 38 49° Se SS AO 

TABLE 2 
CENTROID MATRIX (F) 
Factors 
Variable CodeNo. I II III IV Vv 

UTNE 5.5 oasis Sava 8. 1 438 —.435 —.183 —.083 —.069 
Identical Patterns...... 2 452 —.141 .274 .263 —.200 
Fitting Parts........... 3 335 —.212 —.101 —.140 .087 
Opposite Sides......... + 499 -100 —.163 —.112 —.242 
AES Scare ee > 506 —.297 —.055 —.079 -130 
Circle Grouping........ 6 701 138 296 .272 109 
Form Series............ Z -622 377 110 —.274 —.117 
Circle Reasoning....... 8 -602 281 —.252 .238 .205 
Form Relations......... 9 .728 181 —.154 177 —.093 
Form Reasoning........ 10 665 119 166 —.251 .239 


appeared. Two methods of determining the number of factors 
had been tried by the author (1) previously with some degree 
of success. One of these, Tucker’s empirical criterion, gave 
negative results in the present case. The other, Coombs’ 
criterion (3) postulates that in a 10-variable problem, the 
last factor of value will leave a table of residuals which, when 
signs are changed, will contain more than 31 negative entries 
with a standard error of five. Table 3 shows the application 
of Coombs’ criterion to this analysis: 


191 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 











TABLE 3 
COOMBS’ CRITERION 
Factor Negatives 
Dy as See Staak emake aia maleic bis oasiaiga his dh 9 ane beumeteme 24 
eC Kee che CnC GMb Sie Soa nw ae kids ba Gada yahe k BS RSE Ree 33 
FR a eo rn Sey OPN NM grgg STEN. aa ei-le Fa aay Ps ev rap ete AGN Ce Sdn 24 
Wis ke Wk se abi chs eer nie Be ce he ok Rk aoe eal saewebewis ha aeawee 28 
BE noch beh mie eo oe aA ies ein Salo ee be Sloe mea wee ate 35 


It was obvious from the number of relatively large 
residuals remaining in the table after the second factor was 
extracted that there were more than two factors in the table. 
This was borne out in the subsequent analysis, which was 
carried to five factors. The indication that the fifth factor 
was the last one of value seems to have been verified in the 
analysis. The standard deviation of the fifth factor residuals 
before sign change is .028, which is considerably smaller than 
the standard error of a zero correlation for a population 
of 286. 

For the rotation of factors in order to secure bounding 
hyperplanes, Thurstone’s method of lengthened vectors was 
used (4). The criteria of maximizing the number of zeros 
and rotating to a postulated positive manifold were the deter- 
miners for direction of rotation. Seven rotations were neces- 
sary and a “clean-up” rotation with actual length vectors was 
made. The rotated factorial matrix is given in Table 4. 
The rotational matrix of direction cosines is given in Table 5. 
The intercorrelations between the rotated factors are pre- 
sented in Table 6. 








TABLE 4 
ROTATED FACTORIAL MATRIX (FA) 
Factor 
Variable Code No. A B Cc D E 
SEEM So hetapwew sas 1 .582 .075 -004 .192 .054 
Identical Patterns...... 2 092 547 —.016 239 .014 
Patties Peets........55.5 3 345 —.041 —.009 .265 .005 
Opposite Sides......... 4 -132 .028 -160 313 408 
OSE SS See rearter 5 440 -067 .004 394 —.062 
Circle Grouping........ 6 —.076 436 161 .641 —.073 
Form Series............ 7 —.141 .040 .016 639 453 
Circle Reasoning....... 8 —.010 —.046 561 518 026 
Form Relations........ 9 .076 .162 415 .507 244 


Form Reasoning........ = 071 











ead 
























ANALYSIS OF NON-VERBAL REASONING TEST 


TABLE 5 
TRANSFORMATION MATRIX (A) 








Reference Vector 














Centroid Axis A B Cc D E 
Meriter aaicein seas la ee aake .287 .247 .207 -801 .203 
1 LN AMIS Cn ea Ree —.859 —.243 361 .248 .379 
Rss pn ecu bie See's —.380 .671 —.690 .234 —.224 
SE Pe PRA EEE —.184 521 -582 —.257 —.436 
. OE er es ee ae .033 —.399 All 421 —.759 
TABLE 6 
CORRELATIONS BETWEEN NORMALS TO THE PLANES (A’ A) 
= "Plane > eubapess Se 
Plane A B C D E 
BG satin In vcstars Cais aun ears Sale 1.000 
ety oe eee ne —.084 1.001 
Ree vate oe Res Oe wo eta kinda s —.092 —.241 1.000 
KS Ace Moe wee —.011 —.007 —.009 1.001 
BP a Sag aon aan ot ead eo ee —.127 —.117 —.005 —.003 1.001 


Even a cursory glance at the rotated matrix will show 
that the factorial composition of the tests is not so simple 
as had been hoped for. 


Factor “A” has three variables with significant projections 
and all the others are essentially zero. These are: 


5 MN ERS W's. a Re ier ow EN ced te a aa .58 
Be SN son ee 8 a FERN Rea eee Be 
Fe MN es oh VAs KASH SE EUN CSS LS OER ES 44 


Either one of two interpretations could be placed on this 
factor. It might be considered to be Space as has been de- 
scribed by Thurstone (6), the author (1), and others. Under 
this interpretation it would seem that the grasping of spatial 
relations of the arms and legs of the manikins was of more 
importance than the quick perception of small differences. It 
would appear also, that the quick comparison of the code 
with the stimuli in the Code test was not so important in 
solving the problem as the grasping of the relationship be- 
tween the two halves of the individual elements. 


The other interpretation which could be placed on this 
factor is that it is Perceptual Speed, or rather mental alert- 


193 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


ness, as distinguished from Perceptual Discrimination. Under 
this interpretation the ability would involve the quick change 
of response from item to item with only the simplest dis- 
crimination necessary. Thurstone’s factor “9” in his study 
of Hyde Park High School in Chicago seems to have some 
of the characteristics of factor “A” (5). In this case, the 
test Scattered “x’’s had the highest loading. The Manikin 
test has the simplest discrimination level and the Fitting Parts 
test the most complex of those listed. The author prefers 
this latter interpretation. 


Factor ““B’’ has two tests which have significant loadings: 


res ores .55 
I 6 SRS vane ak ep cstcleese .44 


It seems obvious that this factor corresponds to Thur- 
stone’s (6) Perceptual Speed factor, but we shall call it 
Perceptual Discrimination to distinguish it from factor “A.” 
The difference here is that the emphasis is on analytic per- 
ception in which a fine discrimination must be made rather 
than on speedy response to a simple stimulus. Speed is of 
importance, but in the subjects used the differences in the 
mental process of perceptual discrimination will contribute 
more to performance variance than will simple speed. 


At first glance it seems surprising that Circle Grouping 
is high on this factor. However, a careful subjective analysis 
of the test will indicate that the problems involved are more 
those of perceptual discrimination than of induction. The 
figures are complex but the rules to be brought out are simple. 
For example, one of the items has the middle dot blackened 
in a group of three, which is apparent even at a glance, so 
that the problem resolves into finding the correct group in 
the response square. This takes a discriminatory ability 
evidently slightly below that required for Identical Patterns. 


Factor “‘C”’ has two variables with significant projections: 


Dy, RN cis cis k as ase dk os we 56 
SI Sag ois wine a aR ies cee ae 42 


























ANALYSIS OF NON-VERBAL REASONING TEST 


Both of these tests are variations of tests used by Thur- 
stone in his studies of the primary mental abilities and have 
been interpreted to contain Induction, or Inductive Reasoning. 
This interpretation is suitable in the present case. The ap- 
parent paradox that test 8 contains Induction while test 6, 
in which a supposedly similar function is involved, does not 
may be resolved when an inspection is made of the tests them- 
selves. The primary problem in test 8 is to find a rule by 
which the problem may be solved while in test 6 the main 
problem, as has been said before, is to find the response group 
rather than the rule. 


Factor ‘“‘D” is an orthogonal factor which was set up by 
making its normal perpendicular to the normals of all the 
other planes. This was necessary as one dimension of the 
five-dimensional system could not be identified by a bounding 
hyperplane because of lack of variables with zero projections 
in that dimension. It is the same type of problem as was 
encountered by the author in a former study (1). 


All the variables have projections on this factor which 
are probably significant. The relative amount of projection 
seems to increase with the greater complexity of the mental 
function involved. The tenth test, Form Reasoning, which 
involves the synthesis of geometrical figures according to estab- 
lished rules (not unlike arithmetic), has much the highest 
saturation of the factor. 


The obvious comment, and one that must be reckoned 
with, is that this factor represents “general intelligence,” or 
Spearman’s factor “‘g.’’ As has been said before, there is 
nothing in the Thurstone method of analysis which denies 
that such a general factor exists or implies that it would not 
show up if present. However, in regard to the nature of the 
present factor, there can be little doubt that it is “general” 
for this battery of tests and is not an effect of maturation or 
lack of differentiation of ability due to the youth of the sub- 
jects. What it is called—comprehension, understanding, 
mental efficiency, or intelligence—is beside the point. Due to 
the popular misconceptions and scientific vagueness of the last 


195 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


term, it probably would be better to adopt some other name. 


It should be understood that the author is of the opinion 
that the above-mentioned effect of an augmented general factor 
due to lack of maturation is applicable to situations in which 
the subjects are immature, but that such a factor does not 
account for appreciable distortion in the present case. It is not 
denied that such a general factor is present in tests given to 
children, but it seems probable that the general factor, if it 
exists in such a case, is unduly emphasized by the maturation 
curves of the abilities. 


Another interpretation which might be placed on factor 
“DPD” is that it is Deductive Reasoning, which in each test 
requires that the subject must base his conclusions or responses 
on certain facts which are presented in the test item. How- 
ever, this is probably another aspect of the foregoing 
discussion. 


Factor “E” has significant loadings for two tests and a 
possibly significant loading for a third: 


eI i ind Wh howe RAK ea oes 41 
7s SE RN oi SN eee vk om Sa NRE iit gia 45 
Oe Ee sae Che a ib at einige .24 


This factor apparently corresponds with none of the fac- 
tors previously identified by Thurstone and his associates. 
However, it may possibly represent Deductive Reasoning as 
“series” tests have been found by Thurstone (5) to contain 
a component of Deductive Reasoning. The same is true of 
the form relations type of test. The relationship of the 
Opposite Sides test to such an interpretation is not immediately 
apparent. Assuming that one might consider two figures in 
each item of the Opposite Sides test as facts to be compared 
and from which a conclusion might be drawn concerning the 
third figure, i.e., whether it is different from the first two or 
like one of them, then it might be thought to involve Deduc- 
tion. In the Form Series test the symbols presented are facts 
from which a conclusion must be drawn concerning the missing 
figure. The conclusion is definitely limited to three alternatives 


196 





euiaes — . 
- RISE SETRTRNPS RPO Sa) 











ole — a 
* EI TOTS RRP Sa oy 











ANALYSIS OF NON-VERBAL REASONING TEST 


each of which might be tried in turn.) In the Form Relations 
test the problem might be approached by trying to find the 
rule involved, which would be Induction, or by substituting the 
possible answers one at a time and testing the resulting equa- 
tion. This latter process might be considered to be Deductive 
Reasoning and insofar as it were used would cause the test to 
show a loading on the Deduction factor. 


\ No definite conclusion can be made as to the identity 
of Factor “E,’”. but tentatively it may be called Deductive 
Reasoning. 


Despite the fact that the factorial composition of some 
of the tests varies somewhat from what was originally sup- 
posed, it seems that the tests, as a group, do measure some 
of the higher mental processes of reasoning. From amount 
of projection on the general factor, it would seem that the 
tests saturated with Perceptual Speed are the poorest measures 
of the higher intellective processes. It would appear that test 
number 9, Form \Relations, which has significant projections 
on three factors, is probably the best general test of all the 
reasoning processes. Test 10, Form Reasoning, is the best 
test of the general factor which might be considered to be 
synonymous with comprehension or mental efficiency or intel- 
ligence. The test, /dentical Patterns, seems to be saturated 
with the factor Perceptual Discrimination, which is inter- 
preted quite similarly to Thurstone’s factor of Perceptual 
Speed, and is consistent with Thurstone’s (5) test of [dentical 
Forms, which is parallel in process. The test, Circle Reason- 
ing, a variation of Thurstone’s (5) Marks test, is similar in 
factorial composition to the latter. The Form Relations test 
seems to have a heterogeneous factorial ‘makeup, as was also 
found by Thurstone (6). 

The factors identified seem to be consistent with those 
identified by Thurstone (6,5) except for the general factor. 
It is necessary to investigate these tests in a larger battery 
before an interpretation can be adequately applied to the gen- 
eral factor. This factor has some characteristics similar to 
those found by the author (1) in factor ““D” in a “Reanalysis 


197 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of a Test of the Theory of Two Factors.” The factor Per- 
ceptual Speed also seems similar to the factor ‘‘C” in the 
latter study. 

The factors have been found to be practically uncorrelated, 
the highest correlation, that between factors ‘“‘B” and ‘“‘C,” 
being only 14 degrees off orthogonality. This is probably 
within chance variation and no significance is attached to it. 


REFERENCES 


1. Blakey, R. I. “A Reanalysis of a Test of the Theory of Two 
Factors,” Psychometrika, II (1940), 121-36. 


2. Chesire, L., Saffir, M., and Thurstone, L. L. Computing 
Diagrams for the Tetrachoric Correlation Coefficient. Chicago: 
University of Chicago Press, 1933. 59 pages. 


3. Coombs, Clyde. Unpublished paper read before the American 
Psychological Association, September, 1940. 


4. Thurstone, L. L. “A New Rotational Method in Factor Anal- 
ysis,” Psychometrika, III (1938), 199-218. 


5. Thurstone, L. L. “Experimental Study of Simple Structure,” 
Psychometrika, 11 (1940), 153-68. 


6. Thurstone, L. L. Primary Mental Abilities. Chicago: Uni- 
versity of Chicago Press, 1938. 121 pages. 


198 




















NEW TESTS* 


California Capacity Questionnaire, by Elizabeth T. Sullivan, 
Willis W. Clark, and Ernest W. Tiegs. 1941. For high 
school and college students, and adults. Time, 30 minutes. 
Forms A and B; 75¢ per 25; 25¢ per specimen set. Pub- 
lished by the California Test Bureau, 3636 Beverly Boule- 
vard, Los Angeles, California. 





California Test of Personality, by Louis P. Thorpe, Willis 
W. Clark, and Ernest W. Tiegs. 1940. One form each 
for primary, elementary, intermediate, secondary, and 
adult levels. Time, about 45 minutes for each series. 
Primary series for grades 1-3; elementary series for grades 
4-9; intermediate series for grades 7-10; secondary series 
for grades 9-14; adult series; $1.00 per 25 of each series; 
25¢ per specimen set of each series. Published by the Cali- 
fornia Test Bureau, 3636 Beverly Boulevard, Los Ange- 
les, California. 





Cooperative Community Affairs Test, by Roy A. Price and 
Robert F. Steadman. 1941. Time, 30 minutes. Form R; 
$3.50 per 100; 25¢ per specimen set. Published by the 
Cooperative Test Service, 15 Amsterdam Avenue, New 
York, New York. 





Cooperative Literary Comprehension and Appreciation Test, 
by Hyman Eigerman, Mary Willis, and Frederick B. 
Davis. 1941. For upper high school and college classes. 
Time, 40 minutes. Form R; $4.50 per 100; 25¢ per speci- 





*Publishers and authors of new tests are requested to send copies to The 
Editor, Educational and Psychological Measurement, Box 766, Alexandria, Va. 


199 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


men set. Published by the Cooperative Test Service, 15 
Amsterdam Avenue, New York, New York. 





Cooperative Science Test, by John G. Zimmerman and Rich- 
ard E. Watson. 1941. For grades 7, 8, and 9. Time, 
80 minutes. Form R; $5.50 per 100; 25¢ per specimen 
set. Published by the Cooperative Test Service, 15 Am- 
sterdam Avenue, New York, New York. 





Cooperative Social Studies Test, by Agatha Townsend and 
Mary Willis. 1941. For grades 7, 8, and 9. Time, 80 
minutes. Form R; $5.50 per 100; 25¢ per specimen set. 
Published by the Cooperative Test Service, 15 Amsterdam 
Avenue, New York, New York. 





Dunlap Academic Preference Blank, by Jack W. Dunlap. 
1940. For grades 7, 8, and 9. Forms A and B; 90¢ per 
25; 20¢ per specimen set. Published by the World Book 
Company, Yonkers, New York. 





Eames Eye Test, by Thomas H. Eames. 1940. $3.50 for 
examiner’s kit; 65¢ per 25 individual record cards. Pub- 
lished by the World Book Company, Yonkers, New York. 





Examination for the Measurement of the Efficiency of Mental 
Functioning, by Harriet Babcock and Lydia Levy. 1940. 
One form; set of test materials, $11.20; record blanks, 
$2.30 per 25, $6.90 per 100. Published by C. H. Stoelting 
Company, 424 North Homan Avenue, Chicago, Illinois. 





Fourth Grade Geography Test, by Zoe A. Thralls, George 
Miller, and Marguerite Uttley. 1940. For use at the 
end of the fourth grade. Time, 35 minutes. One form; 
8¢ per test; 4¢ per manual; 20¢ per scoring stencil. Pub- 
lished by McKnight and McKnight, Bloomington, Illinois. 


200 





~ SPRRORR aaEER 














SRR: kes 














NEW TESTS 


Hills Economics Test, by John R. Hills. 1940. For high 
school and college students. Time, 40 minutes. One form; 
50¢ per 25; 15¢ per specimen set. Published by Bureau 
of Educational Measurements, Kansas State Teachers 
College, Emporia, Kansas. 





Kansas Vocabulary Test, by H. E. Schrammel, O. M. Ras- 
mussen, Anna Huebert, and D. J. Tate. 1940. For grades 
4 to 8. Forms A and B; 40¢ per 25; 15¢ per specimen set. 
Published by Bureau of Educational Measurements, Kan- 
sas State Teachers College, Emporia, Kansas. 





Kirkpatrick Chemistry Test, by Ernest Kirkpatrick. 1940. 
For high school students. Time, 40 minutes. One form; 
60¢ per 25; 15¢ per specimen set. Published by Bureau 
of Educational Measurements, Kansas State Teachers 
College, Emporia, Kansas. 





Kniss World History Test, by F. Roscoe Kniss. 1940. For 
high school students. Time, 50 minutes. Forms A and B; 
$1.30 per 25c; 20¢ per specimen set. Published by the 
World Book Company, Yonkers, New York. 





Mechanical Comprehension Test, by George K. Bennett. 
1940. For male high school students and adults. Time, 
about 25 minutes. One form; $2.50 per 25 booklets and 
answer sheets; 25¢ per specimen set. Published by the 
Psychological Corporation, 522 Fifth Avenue, New York, 
New York. 





Minnesota Personality Scale, by John G. Darley and Walter 
J. McNamara. 1941. For upper high school and college 
students. Time, about 45 minutes. Separate question 


201 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


booklets for men and women; answer sheet can be used 
with either question booklet; scorable by International 
Test Scoring Machine; $1.50 per 25 question booklets; 
75¢ per 25 answer sheets; 35¢ per specimen set. Published 
by the Psychological Corporation, 522 Fifth Avenue, New 
York, New York. 





Mordy-Schrammel American Government Test, by F. E. 
Mordy and H. E. Schrammel. 1940. For high school and 
college students. Time, 40 minutes. One form; 50¢ per 
25; 15¢ per specimen set. Published by Bureau of Edu- 
cational Measurements, Kansas State Teachers College, 
Emporia, Kansas. 





Mordy-Schrammel Constitution Test, by F. E. Mordy and 
H. E. Schrammel. 1940. For high school and college 
students. Time, 40 minutes. One form; 50¢ per 25; 15¢ 
per specimen set. Published by the Bureau of Educational 
Measurements, Kansas State Teachers College, Emporia, 
Kansas. 





Peabody Library Information Test, by Louis Shores and Jo- 
seph E. Moore. 1940. One form each for college, high 
school, and elementary school levels. Time, 30 minutes. 
College level: one form; $1.00 per 25. High school level: 
one form; 75¢ per 25. Elementary school level: one form; 
60¢ per 25; 20¢ per specimen set. Published by the Edu- 
cational Test Bureau, 720 Washington Avenue, S.E., 
Minneapolis, Minnesota. 





Rasmussen Trigonometry Test, by O. M. Rasmussen and O. 
J. Peterson. 1940. For high school and college students. 
Time, 40 minutes. One form; 50¢ per 25; 15¢ per speci- 
men set. Published by Bureau of Educational Measure- 
ments, Kansas State Teachers College, Emporia, Kansas. 


202 











—E 














RE 


nl sel 











NEW TESTS 


Stanford Achievement Test, by Truman L. Kelley, Lewis M. 


Terman, and Giles M. Ruch. 1941. Forms D and E for 
each of primary, intermediate, and advanced levels from 
grades 2 to 9. Primary Battery, for grades 2 and 3: 
time, 50 minutes; $1.10 per 25; 20¢ per specimen set. 
Intermediate Battery—Complete, for grades 4 to 6: time, 
150 minutes; $2.00 per 25; 40¢ per specimen set. Ad- 
vanced Battery—Complete, for grades 7 to 9: time, 150 
minutes; $2.00 per 25; 40¢ per specimen set. Published 
by the World Book Company, Yonkers, New York. 





Tate Economic Geography Test, by D. J. Tate and G. A. 


Buzzard. 1940. For high school and college students. 
Time, 50 minutes. Forms A and B; 50¢ per 25; 15¢ per 
specimen set. Published by Bureau of Educational Meas- 
urements, Kansas State Teachers College, Emporia, 
Kansas. 





Trusler-Arnett Health Knowledge Test, by V. T. Trusler, 


C. E. Arnett, Jr., and H. E. Schrammel. 1940. For 
grades 9 to 12 and college. Time, 50 minutes. Forms A 
and B; 50¢ per 25; 15¢ per specimen set. Published by 
Bureau of Educational Measurements, Kansas State 
Teachers College, Emporia, Kansas. 





Turse Shorthand Aptitude Test, by Paul L. Turse. 1940. For 


use with high school students before enrolling in short- 
hand courses. Time, 45 minutes. One.form; $1.30 per 25; 
10¢ per specimen set. Published by the World Book Com- 
pany, Yonkers, New York. 





Vocational Inventory, by Curtis G. Gentry. 1940. For high 


school and college students, and adults. Time, about 150 
minutes. One form; 15¢ for vocational inventory, indi- 


203 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


vidual analysis report, and individual score tabulation 

sheet; 25¢ per sample set. Published by the Educational 
Test Bureau, 720 Washington Avenue, S.E., Minneapolis, , 
Minnesota. 








aaa 





204 














MEASUREMENT ABSTRACTS* 


Adkins, Dorothy C. and Kuder, G. Frederic. ‘The Relation 
of Primary Mental Abilities to Activity Preferences.” 
Psychometrika, V (1940), 251-62. 


The relations of abilities, as measured by Thurstone’s 
Tests for Primary Mental Abilities, to activity preferences, as 
measured by Kuder’s Preference Record, are investigated for 
a population of 512 university freshmen. Ability profiles for 
contrasted groups on each preference scale reveal relatively 
slight overlapping between the two sets of measures, although 
the apparent trends are reasonable. The Pearson intercor- 
relation coefficients of all pairs of measures involved were 
determined. Implications of the findings in relation to theory 
and to educational and vocational guidance are indicated. 
(Courtesy Psychometrika.) 





Allison, G. and Barnett, A. “Freshman Psychological Exam- 
ination Scores as Related to Size of High Schools.” Jour- 
nal of Applied Psychology, XXIV (1940), 651-52. 


Quantitative and linguistic scores of 1,083 college fresh- 
men on the 1938 edition of the A.C.E. Test were analyzed 
with reference to the size of the high schools from which 
they graduated. For three size-groups, statistically significant 
differences in means were found in five of six comparisons. 
Means tend to increase with enrollment but there is much 
overlapping. W. A. Varvel. 





Anderson, H. A. and Traxler, A. E. “The Reliability of the 
Reading of an English Essay Test.” Part Il. School 
Review, XLVIII (1940), 521-30. 





*Edited by Professor Forrest A. Kingsbury. 


205 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Factual notes were prepared on two themes: “The Dis- 
covery of Gold in California” (Form A), and “The Pony 
Express” (Form B). A group of 281 pupils in the University 
High School of the University of Chicago were given the 
two forms at one year’s interval with instructions to expand 
the material into a two-hour essay. The essays were graded 
on a sixty-point scale with the following weights for the sep- 
arate factors: completeness (6), spelling (6), punctuation 
(6), language errors (6), coherence between main divisions 
(10), organization of paragraphs (10), and organization of 
essay sentences (10). On rereading 70 essays of each form 
the grades of a skilled reader showed correlations of .893 + 
.016 and .937 + .010 for the two forms; two readers, on 
first scoring of 25 papers, showed correlations of .859 + .035 
and .898 + .026 for the two forms. For individual factors, 
no correlation was below .80. Growth in language ability may 
be indicated by an average gain of 3.3 points from Form A 
to Form B for 281 pupils. The results are not deemed con- 
clusive but only suggestive of the desirability of experimenta- 
tion with essay-test procedures. J. E. Karlin. 


Babitz, Milton and Keys, Noel. “A Method for Approximat- 
ing the Average Inter-Correlation Coefficient by Corre- 
lating the Parts with the Sum of the Parts.” Psycho- 
metrika V (1940), 283-88. 

It is noted that the average inter-item correlation, which 
represents the internal consistency of a test, yields a unique 
estimate of test reliability. A close approximation to this 
average is given by a formula which requires the correlation 
of each item with the total score and the standard deviation 
of each item. The formula is especially useful in those in- 
stances where the number of items is small and where the 
variation in item sigmas should not be neglected. (Courtesy 
Psychometrika.) 


Benton, A. L. and Perry, J. D. “A Study of the Predictive 
Value of the Stanford Scientific Aptitude Test (Zyve).” 
Journal of Psychology, X (1940), 309-12. 

Scores on the Stanford Scientific Aptitude Test and the 








206 

















MEASUREMENT ABSTRACTS 


A.C.E. Psychological Examination (1934-35) together with 
course grades for 43 students over a period of three to four 
years were used in an investigation of the predictive value of 
the Aptitude Test. The average score on the A.C.E. was 
approximately one sigma above the mean for the 1935 fresh- 
men. Correlations of course grades for scientific and non- 
scientific courses with the Aptitude Test and the A.C.E. Test 
were about + .35, there being no significant difference between 
the sets of correlations. The coefficients of correlation of 
the 11 subtests and average grades in all college courses ‘‘were 
all quite low.” The authors suggest “that the test has a cer- 
tain limited value in prognosticating the scholastic achievement 
of freshman and sophomore students.’ Harold Bechtoldt. 





Buros, Oscar Krisen, Editor. The Nineteen Forty Mental 
Measurements Yearbook. Highland Park, N. J.: The 
Mental Measurements Yearbook; pp. 674 + xxxiii. 1941. 


The first part of the Yearbook contains reviews of new 
tests as well as of selected older tests. There are 524 tests 
listed. Most of these are reviewed by two or three reviewers. 
The second part lists 368 books and pamphlets in the measure- 
ment field and excerpts from reviews of them which have 
been published in various journals. 


Cast, B. M. D. “The Efficiency of Different Methods of 

Marking English Composition.” Part II. British Journal 

of Educational Psychology, X (1940), 49-60. 

Forty English compositions were marked by 12 examiners 
by four different methods: (1) the examiner’s own habitual 
method; (2) the method of general impression; (3) Burt’s 
analytic method (allotting separate marks for specified points 
or qualities) ; (4) Hartog’s achievement method. The P-tech- 
nique (correlation of persons) was combined with Burt’s 
summation method for a factorial analysis of the correlations 
between examiners; this resulted in: (a) a general factor (rep- 
resenting the best approximation to the “true marks’’) account- 
ing for 50 per cent of the variance; (b) a dichotomous factor 
of examiners marking better by analytic methods or by intu- 





207 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


itive or impressionistic methods. The methods of marking for 
general use are here found to be in order of preference: the 
‘analytic’ method, the method of general impression, the 
examiner’s habitual method, and MHartog’s achievement 
method. J. E. Karlin. 





Daniel, C. “Statistically Significant Differences in Observed 
Per Cents.” Journal of Applied Psychology, XXIV 
(1940), 826-30. 

A table gives the amount by which a per cent A observed 
in one sample must exceed a per cent B observed in another 
sample of the same size to be significant at the 0.05 level. 
It is presented for different values of B and for samples from 
20 to 1,000. Various conditions are stated and the meaning 
and use of the table discussed. W. A. Varvel. 





Davis, F. B. “The Interpretation of I.Q.’s Derived from the 
1937 Revision of the Stanford-Binet Scales.” Journal of 
Applied Psychology, XXIV (1940), 595-604. 

The author presents a table of equivalent values for I.Q.’s 
from the 1916 and 1937 revisions of the Stanford-Binet and 
suggests a new classification of I.Q.’s based on the 1937 form. 
The method by which the equivalency was calculated is dis- 
cussed. The suggested classification of 1.Q.’s provides a series 
of equal steps or gradations of brightness. W. 4. Varvel. 





Dongan, K. E. and Gory, A. E. “Selecting Unskilled Laborers 
in Cincinnati.”” Public Personnel Review, 1, No. 3 (1940), 
43-50. 

Job analyses were made of jobs for unskilled laborers 
as eligible lists became needed. It was agreed that the ability 
to read and write, a good physique, intelligence, experience, 
and an age range of 21 to 45 or 50 were required for the jobs. 

An examination for waste collector included a practical 
test calling for repeating a demonstration given by regular 
workers, an evaluation of training and experience, and an oral 
interview. A test for street cleaners was composed of 75 
multiple-choice items on arithmetic, vocabulary, reasoning, and 


208 

















MEASUREMENT ABSTRACTS 


general information. These questions were put in the language 
of laborers. 

Examining for unskilled labor positions has gone on only 
since February, 1940. The departments, however, believe 
they are getting better workers. 





Dressel, Paul L. ‘‘Some Remarks on the Kuder-Richardson 
Reliability Coefficient.” Psychometrika, V (1940), 305- 
10. 

The Kuder-Richardson reliability coefficient is derived in 

a manner independent of that originally given. Various alter- 

native forms applicable to special situations are exhibited 

with the purpose of making them available to others interested 
in using this formula. A simplification in computation is sug- 
gested for use with a calculating machine. (Courtesy 

Psychometrika.) 





Ferguson, George A. “The Application of Sheppard’s Cor- 
rection for Grouping.” Psychometrika, VI (1941), 21-7. 
This paper attempts to show in a non-mathematical way 

the influence of grouping on standard deviations and correla- 

tions, and advances empirical evidence to illustrate with what 
accuracy values corrected for grouping by Sheppard’s correc- 
tion approximate values obtained from ungrouped data when 
the distributions are continuous. This inquiry gained its initial 
stimulus from the observation that many standard deviations 
and correlations reported by students of psychology and edu- 
cation are uncorrected for grouping and that frequently errors 
attributed to the grouping of data are not small when com- 
pared with errors of sampling. (Courtesy Psychometrika.) 


Godard, R. H. and Lindquist, E. F. “An Empirical Study of 
the Effect of Heterogeneous Within-Groups Variance upon 
Certain F-Tests of Significance in Analysis of Variance.” 
Psychometrika, V (1940), 263-74. 

In the application of the analysis of variance to data 
obtained in educational methods experiments which involve 
several classes of several schools, one assumption is that of 
homogeneity in the variances of pupil scores from school to 


209 

















EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


school. It is shown that such variances on representative 
educational achievement tests are heterogeneous. The effects 
of this heterogeneity upon the F-tests of significance commonly 
employed in methods experiments are investigated by com- 
paring the actual distribution of F values for a large number 
of “experiments” involving marked heterogeneity with a 
theoretical distribution based on the assumption of homo- 
geneity. Although the findings, which vary somewhat with the 
type of variance ratio, are not entirely conclusive, they appar- 
ently demonstrate that departure from homogeneity does not 
invalidate the use of the customary F-tests for evaluating 
results of the typical methods experiment. (Courtesy 
Psychometrika.) 





Goodenough, Florence L. and Maurer, Katharine M. ‘The 
Relative Potency of the Nursery School and the Statis- 
tical Laboratory in Boosting the I.Q.”” Journal of Educa- 
tional Psychology, XXXII (1940), 541-49. 

This study recomputed data obtained at the Minnesota 
Nursery School by those statistical procedures generally 
employed in the Iowa statistical laboratory. In the Iowa 
procedure, cases were grouped according to initial I.Q. instead 
of paternal occupation. This recomputation of data, which 
when handled properly showed no effect of nursery school 
training upon the I.Q., gave results similar to those reported 
from Iowa. A difference in I.Q. appeared for children who 
remained at home as well as for nursery school children. The 
authors conclude that the previously reported differences are 
the result of fallacious statistical treatment rather than being 
an educational phenomenon. D. A. Peterson. 





Guilford, J. P. “The Phi Coefficient and Chi Square as Indices 
of Item Validity.” Psychometrika, VI (1941), 11-9. 
Two new methods of item analysis are described. One 

involves the computation of the @ coefficient (correlation of 

a fourfold point distribution) and the other involves chi 

square. The only data required are the proportions of passing 

individuals in the upper and lower criterion groups, for the 


210 
























MEASUREMENT ABSTRACTS 


determination of @, and in addition, N, for the determination 
of chi square. Abacs are presented for graphic solution of the 
two indices of validity, and tests of significance are provided. 
(Courtesy Psychometrika.) 





Jenkins, R. L. ‘‘Considerations Relative to the Selection of an 
Index of Intelligence.”” Journal of Educational Psychol- 
ogy, XXXI (1940), 527-40. 

The test-retest stability of the I1.Q. and the P.C. (Heinis 
personal constant) are compared in terms of Binet test ratings 
for 1,774 cases. The group was weighted with retarded 
children. Comparisons of all adjacent tests were made. 
Regression of both I.Q.’s and P.C.’s toward the mean on 
retest was found with marked drops in the P.C.’s of very 
bright children. ‘The P.C. appears to offer no advantage 
over the I.Q. for the children of the middle-age group... .” 
and appears to be slightly inferior to the 1.Q. at the lower 
age levels. 


The rationale underlying the two statistics are considered. 
Dispersions in intelligence are assumed in both cases to be 
proportional to the mental age. The growth function assumed 
by the I.Q. and the P.C. are presented with the point that 
both curves have one degree of freedom. 


It is suggested that a more logical approach would be to 
express “mental test performance in terms of the sigma value 
of test score for the chronological age.” The assumption is 
less restrictive than those for the constancy of the I.Q. or 
P.C.; the assumption is “that the relative status of children 
with respect to intelligence remains constant,’’ which is 
‘implicit in the use of any index of intelligence for predictive 
purposes.” This index avoids the logical fallacy involved in 
adult mental ages. It is pointed out that the growth function 
may be a two-parameter curve which would not interfere with 
the use of sigma values, but would reduce the value of a single 
parameter statistic. Harold Bechtoldt. 


211 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Page, J. D. ‘The Effect of Nursery-School Attendance Upon 
Subsequent I.Q.” Journal of Psychology, X (1940), 
221-30. 

Stanford-Binet I.Q.’s of 72 children in kindergarten to the 
fifth grade who had previously attended nursery school 125 
to 525 days were compared with those of adjacent older sib- 
lings who had not attended preschool. One hundred children 
of like age and socio-economic status were also compared with 
their adjacent older siblings, none of either group having 
attended preschool. No significant differences in I.Q. could 
be referred to nursery-school attendance. A slight advantage 
of younger siblings in both experimental and control groups 
was explained by age fluctuations in the standardization of 
the L form of the Stanford-Binet. No relation was found 
between duration of nursery-school attendance and subsequent 
I.Q. advantage. The mean I.Q. difference between sibling 
pairs approximated 10 points. W. 4. Varvel. 





Powell, N. J. “Check List for Use in Civil Service Objective 
Test Preparation.” Public Personnel Quarterly, II (1940- 
41), 13-6. 

The article includes a list of questions which have been 
developed for reviewing civil service objective tests before 
they are finally used. Its use is intended to “increase the prob- 
ability that no major basis upon which the test will be 
appraised has been ignored in the test construction.”’ Points 
to be checked are listed under the following headings: validity, 
cost, appearance of test, typography, and administration. A 
number of questions applying specifically to completion items 
and multiple-choice items are also listed. 





Roff, Merrill. “Linear Dependence in Multiple Correlation 

Work.” Psychometrika, V (1940), 295-98. 

The problem in multiple correlation work of nonsense 
results attributable to linear dependence of variables, which 
has been discussed by Ragnar Frisch in relation to economic 
data, is presented from the standpoint of its significance in 
psychological research. It is shown that a symmetric corre- 


212 

















MEASUREMENT ABSTRACTS 


lation determinant with unity in the diagonal cells can vanish 
only when there is a first-order or partial correlation of unity 
between one pair of the variables. On the basis of this result, 
it is argued that the problem should be expected to cause less 
difficulty in the field of psychology than in economics and that 
psychologists should be able to avoid the pitfall by bringing 
to bear their knowledge of the variables with which they are 
working. (Courtesy Psychometrika.) 





Royer, Elmer B. “A Machine Method for Computing the 
Biserial Correlation Coefficient in Item Validation.” 
Psychometrika, VI (1941), 55-9. 

A method for computing the biserial correlation coefficient 
with the aid of punch-card equipment is outlined. A numer- 
ical example and a work sheet layout are included in the 
presentation. (Courtesy Psychometrika.) 





Ryans, David G. The First Step in Guidance: Self-Appraisal. 

New York, Cooperative Test Service. 35 pp. 1941. 

A report of the 1940 Sophomore testing program in which 
the following tests were used: Cooperative English Test, Form 
Q; Cooperative General Culture Test, Form Q; and Coopera- 
tive Contemporary Affairs Test, Form 1940. 





Sisk, H. L. “A Note on the Comparative Value of the “True’ 
Index of Studiousness for the Purpose of Prognosis.” 
Journal of Psychology, X (1940), 275-78. 

The scholastic achievement of 585 university freshmen was 
predicted from Symond’s “true” Index of Studiousness and 
from a battery of tests, composed of aptitude, English, and 
reading. The latter was found to give a more reliable pre- 
diction of first semester grades. W. A. Varvel. 





Stoy, E. G. “Selection of Key-Punch Operators.” Journal of 
Applied Psychology, XXIV (1940), 653-54. 
These are notes on preliminary experimentation in the 
selection of key-punch operators. Four tests warrant further 


213 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


consideration: “‘an eye-hand coordination test in which letter 
combinations involving both hands are registered on counters, 
a test of verbal and spatial memory, a clerical type of test, and 
an arithmetic test.” W. A. Varvel. 





Swineford, Frances and Holzinger, Karl J. “Selected Refer- 
ences on Statistics, the Theory of Test Construction, and 
Factor Analysis.” School Review, XLVIII (1940), 
460-66. 

Articles covering the year March, 1939, to February, 
1940, are presented with brief notes as to the nature of the 
problem handled in each paper. Twelve articles are given 
under the heading “‘Theory and Use of Statistical Mehods,” 
18 under ‘Problems of Test Construction,’ and 16 under 
“Factor Analysis.”” Harold Bechtoldt. 





Thurstone, L. L. “A Factorial Study of Visual Gestalt 
Effects.” Psychometrika, V (1940), 315-16: (Abstract 
of a paper read at the September, 1940, meeting of the 
American Psychological Association. ) 





Toolon, W. T. “Essential Factors in Test Construction.” 

Personnel Journal, XVIV (1940), 204-08. 

The value of careful “informal examination” of test items 
before and after statistical treatment is pointed out, and an 
analysis of the nature of the items and of the errors made 
is suggested. Factors dealt with include item difficulty, item 
correlations, closeness of distractors, and the judgment and 
information of the subject. Harold Bechtoldt. 





Tucker, Ledyard R. “A Matrix Multiplier.” Psychometrika, 

V (1940), 289-94. 

A machine to expedite matrix multiplication has been 
developed by modifying the International Business Machines 
Corporation scoring machine. The principles and operation 
of the machine are described, and time and accuracy estimates 
are indicated. (Courtesy Psychometrika.) 


214 





